[jira] [Commented] (KAFKA-9594) speed up the processing of LeaderAndIsrRequest
[ https://issues.apache.org/jira/browse/KAFKA-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045151#comment-17045151 ] ASF GitHub Bot commented on KAFKA-9594: --- omkreddy commented on pull request #8153: KAFKA-9594: Add a separate lock to pause the follower log append while checking if the log dir could be replaced. URL: https://github.com/apache/kafka/pull/8153 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > speed up the processing of LeaderAndIsrRequest > -- > > Key: KAFKA-9594 > URL: https://issues.apache.org/jira/browse/KAFKA-9594 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: Manikumar >Priority: Minor > Fix For: 2.6.0 > > > Observations from [~junrao] > Currently, Partition.makerFollower() holds a write lock on > leaderIsrUpdateLock. Partition.doAppendRecordsToFollowerOrFutureReplica() > holds a read lock on leaderIsrUpdateLock. So, if there is an ongoing log > append on the follower, the makeFollower() call will be delayed. This path is > a bit different when serving the Partition.makeLeader() call. Before we make > a call on Partition.makerLeader(), we first remove the follower from the > replicaFetcherThread. So, the makerLeader() call won't be delayed because of > log append. This means that when we change one follower to become leader and > another follower to follow the new leader during a controlled shutdown, the > makerLeader() call typically completes faster than the makeFollower() call, > which can delay the follower fetching from the new leader and cause ISR to > shrink. > This only reason that Partition.doAppendRecordsToFollowerOrFutureReplica() > needs to hold a read lock on leaderIsrUpdateLock is for > Partiiton.maybeReplaceCurrentWithFutureReplica() to pause the log append > while checking if the log dir could be replaced. We could potentially add a > separate lock (sth like futureLogLock) that's synced between > maybeReplaceCurrentWithFutureReplica() and > doAppendRecordsToFollowerOrFutureReplica(). Then, > doAppendRecordsToFollowerOrFutureReplica() doesn't need to hold the lock on > leaderIsrUpdateLock. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9594) speed up the processing of LeaderAndIsrRequest
[ https://issues.apache.org/jira/browse/KAFKA-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-9594. -- Resolution: Fixed Issue resolved by pull request 8153 [https://github.com/apache/kafka/pull/8153] > speed up the processing of LeaderAndIsrRequest > -- > > Key: KAFKA-9594 > URL: https://issues.apache.org/jira/browse/KAFKA-9594 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: Manikumar >Priority: Minor > Fix For: 2.6.0 > > > Observations from [~junrao] > Currently, Partition.makerFollower() holds a write lock on > leaderIsrUpdateLock. Partition.doAppendRecordsToFollowerOrFutureReplica() > holds a read lock on leaderIsrUpdateLock. So, if there is an ongoing log > append on the follower, the makeFollower() call will be delayed. This path is > a bit different when serving the Partition.makeLeader() call. Before we make > a call on Partition.makerLeader(), we first remove the follower from the > replicaFetcherThread. So, the makerLeader() call won't be delayed because of > log append. This means that when we change one follower to become leader and > another follower to follow the new leader during a controlled shutdown, the > makerLeader() call typically completes faster than the makeFollower() call, > which can delay the follower fetching from the new leader and cause ISR to > shrink. > This only reason that Partition.doAppendRecordsToFollowerOrFutureReplica() > needs to hold a read lock on leaderIsrUpdateLock is for > Partiiton.maybeReplaceCurrentWithFutureReplica() to pause the log append > while checking if the log dir could be replaced. We could potentially add a > separate lock (sth like futureLogLock) that's synced between > maybeReplaceCurrentWithFutureReplica() and > doAppendRecordsToFollowerOrFutureReplica(). Then, > doAppendRecordsToFollowerOrFutureReplica() doesn't need to hold the lock on > leaderIsrUpdateLock. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8713) [Connect] JsonConverter NULL Values are replaced by default values even in NULLABLE fields
[ https://issues.apache.org/jira/browse/KAFKA-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045149#comment-17045149 ] Cheng Pan commented on KAFKA-8713: -- [~rhauch], thanks for your reply. I totally agree with you this is an incompatibility change, and I also think that introducing a configuration property is a good choice. I'd like to follow the process to come out a KIP draft to push on this ISSUE. > [Connect] JsonConverter NULL Values are replaced by default values even in > NULLABLE fields > -- > > Key: KAFKA-8713 > URL: https://issues.apache.org/jira/browse/KAFKA-8713 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.3.0, 2.2.1 >Reporter: Cheng Pan >Priority: Major > Labels: needs-kip > > Class JsonConverter line: 582 > {code:java} > private static JsonNode convertToJson(Schema schema, Object logicalValue) > { > if (logicalValue == null) { > if (schema == null) // Any schema is valid and we don't have a > default, so treat this as an optional schema > return null; > if (schema.defaultValue() != null) > return convertToJson(schema, schema.defaultValue()); > if (schema.isOptional()) > return JsonNodeFactory.instance.nullNode(); > throw new DataException("Conversion error: null value for field > that is required and has no default value"); > } > > } > {code} > h1.Expect: > Value `null` is valid for an optional filed, even though the filed has a > default value. > Only when field is required, the converter return default value fallback > when value is `null`. > h1.Actual: > Always return default value if `null` was given. > h1. Example: > I'm not sure if the current behavior is the exactly expected, but at least on > MySQL, a table define as > {code:sql} > create table t1 { >name varchar(40) not null, >create_time datetime default '1999-01-01 11:11:11' null, >update_time datetime default '1999-01-01 11:11:11' null > } > {code} > Just insert a record: > {code:sql} > INSERT INTO `t1` (`name`, `update_time`) VALUES ('kafka', null); > {code} > The result is: > {code:json} > { > "name": "kafka", > "create_time": "1999-01-01 11:11:11", > "update_time": null > } > {code} > But when I use debezium pull binlog and send the record to Kafka with > JsonConverter, the result changed to: > {code:json} > { > "name": "kafka", > "create_time": "1999-01-01 11:11:11", > "update_time": "1999-01-01 11:11:11" > } > {code} > For more details, see: https://issues.jboss.org/browse/DBZ-1064 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9602) Incorrect close of producer instance during partition assignment
[ https://issues.apache.org/jira/browse/KAFKA-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045131#comment-17045131 ] ASF GitHub Bot commented on KAFKA-9602: --- guozhangwang commented on pull request #8166: KAFKA-9602: Close the stream internal producer only in EOS URL: https://github.com/apache/kafka/pull/8166 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Incorrect close of producer instance during partition assignment > > > Key: KAFKA-9602 > URL: https://issues.apache.org/jira/browse/KAFKA-9602 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > The new StreamProducer instance close doesn't distinguish between an > EOS/non-EOS shutdown. The StreamProducer should take care of that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9602) Incorrect close of producer instance during partition assignment
[ https://issues.apache.org/jira/browse/KAFKA-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-9602. -- Resolution: Fixed > Incorrect close of producer instance during partition assignment > > > Key: KAFKA-9602 > URL: https://issues.apache.org/jira/browse/KAFKA-9602 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > The new StreamProducer instance close doesn't distinguish between an > EOS/non-EOS shutdown. The StreamProducer should take care of that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9498) Topic validation during the creation trigger unnecessary TopicChange events
[ https://issues.apache.org/jira/browse/KAFKA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao resolved KAFKA-9498. Fix Version/s: 2.6.0 Resolution: Fixed Merged the PR to trunk. > Topic validation during the creation trigger unnecessary TopicChange events > > > Key: KAFKA-9498 > URL: https://issues.apache.org/jira/browse/KAFKA-9498 > Project: Kafka > Issue Type: Bug >Reporter: David Jacot >Assignee: David Jacot >Priority: Major > Fix For: 2.6.0 > > > I have found out that the topic validation logic, which is executed when > CreateTopicPolicy or when validateOnly is set, triggers unnecessary > ChangeTopic events in the controller. In the worst case, it can trigger up to > one event per created topic and leads to overloading the controller. > This happens because the validation logic reads all the topics from ZK using > the method getAllTopicsInCluster provided by the KafkaZKClient. This method > registers a watch every time the topics are read from Zookeeper. > I think that we should make the watch registration optional for this call in > oder to avoid this unwanted behaviour. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9498) Topic validation during the creation trigger unnecessary TopicChange events
[ https://issues.apache.org/jira/browse/KAFKA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045071#comment-17045071 ] ASF GitHub Bot commented on KAFKA-9498: --- junrao commented on pull request #8062: KAFKA-9498; Topic validation during the topic creation triggers unnecessary TopicChange events URL: https://github.com/apache/kafka/pull/8062 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Topic validation during the creation trigger unnecessary TopicChange events > > > Key: KAFKA-9498 > URL: https://issues.apache.org/jira/browse/KAFKA-9498 > Project: Kafka > Issue Type: Bug >Reporter: David Jacot >Assignee: David Jacot >Priority: Major > > I have found out that the topic validation logic, which is executed when > CreateTopicPolicy or when validateOnly is set, triggers unnecessary > ChangeTopic events in the controller. In the worst case, it can trigger up to > one event per created topic and leads to overloading the controller. > This happens because the validation logic reads all the topics from ZK using > the method getAllTopicsInCluster provided by the KafkaZKClient. This method > registers a watch every time the topics are read from Zookeeper. > I think that we should make the watch registration optional for this call in > oder to avoid this unwanted behaviour. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9572) Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some Records
[ https://issues.apache.org/jira/browse/KAFKA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-9572. -- Fix Version/s: (was: 2.5.0) 2.6.0 Resolution: Fixed > Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some > Records > --- > > Key: KAFKA-9572 > URL: https://issues.apache.org/jira/browse/KAFKA-9572 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Bruno Cadonna >Assignee: Guozhang Wang >Priority: Blocker > Fix For: 2.6.0 > > Attachments: 7-changelog-1.txt, data-1.txt, streams22.log, > streams23.log, streams30.log, sum-1.txt > > > System test {{StreamsEosTest.test_failure_and_recovery}} failed due to a > wrongly computed aggregation under exactly-once (EOS). The specific error is: > {code:java} > Exception in thread "main" java.lang.RuntimeException: Result verification > failed for ConsumerRecord(topic = sum, partition = 1, leaderEpoch = 0, offset > = 2805, CreateTime = 1580719595164, serialized key size = 4, serialized value > size = 8, headers = RecordHeaders(headers = [], isReadOnly = false), key = > [B@6c779568, value = [B@f381794) expected <6069,17269> but was <6069,10698> > at > org.apache.kafka.streams.tests.EosTestDriver.verifySum(EosTestDriver.java:444) > at > org.apache.kafka.streams.tests.EosTestDriver.verify(EosTestDriver.java:196) > at > org.apache.kafka.streams.tests.StreamsEosTest.main(StreamsEosTest.java:69) > {code} > That means, the sum computed by the Streams app seems to be wrong for key > 6069. I checked the dumps of the log segments of the input topic partition > (attached: data-1.txt) and indeed two input records are not considered in the > sum. With those two missed records the sum would be correct. More concretely, > the input values for key 6069 are: > # 147 > # 9250 > # 5340 > # 1231 > # 1301 > The sum of this values is 17269 as stated in the exception above as expected > sum. If you subtract values 3 and 4, i.e., 5340 and 1231 from 17269, you get > 10698 , which is the actual sum in the exception above. Somehow those two > values are missing. > In the log dump of the output topic partition (attached: sum-1.txt), the sum > is correct until the 4th value 1231 , i.e. 15968, then it is overwritten with > 10698. > In the log dump of the changelog topic of the state store that stores the sum > (attached: 7-changelog-1.txt), the sum is also overwritten as in the output > topic. > I attached the logs of the three Streams instances involved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9610) Should not throw illegal state exception during task revocation
[ https://issues.apache.org/jira/browse/KAFKA-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045002#comment-17045002 ] ASF GitHub Bot commented on KAFKA-9610: --- abbccdda commented on pull request #8169: KAFKA-9610: do not throw illegal state when remaining partitions are not empty URL: https://github.com/apache/kafka/pull/8169 For `handleRevocation`, it is possible that previous onAssignment callback has cleaned up the stream tasks, which means no corresponding task could be found for given partitions. We should not throw here as this is expected behavior. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Should not throw illegal state exception during task revocation > --- > > Key: KAFKA-9610 > URL: https://issues.apache.org/jira/browse/KAFKA-9610 > Project: Kafka > Issue Type: Bug >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > In handleRevocation call, the remaining partitions could cause an illegal > state exception on task revocation. This should also be fixed as it is > expected when the tasks are cleared from the assignor onAssignment callback. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9607) Should not clear partition queue during task close
[ https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9607: --- Description: We detected an issue with a corrupted task failed to revive: {code:java} [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,137] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle new assignment with: New active tasks: [0_0, 3_1] New standby tasks: [] Existing active tasks: [0_0] Existing standby tasks: [] (org.apache.kafka.streams.processor.internals.TaskManager) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,138] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] [Consumer clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, groupId=stream-soak-test] Adding newly assigned partitions: k8sName-id-repartition-1 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,138] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State transition from RUNNING to PARTITIONS_ASSIGNED (org.apache.kafka.streams.processor.internals.StreamThread) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,419] WARN [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException fetching records from restore consumer for partitions [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it is likely that the consumer's position has fallen out of the topic partition offset range because the topic was truncated or compacted on the broker, marking the corresponding tasks as corrupted and re-initializingit later. (org.apache.kafka.streams.processor.internals.StoreChangelogReader) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,139] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] [Consumer clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, groupId=stream-soak-test] Setting offset for partition k8sName-id-repartition-1 to the committed offset FetchPosition{offset=3592242, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092 (id: 1003 rack: null)], epoch=absent}} (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,463] ERROR [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Encountered the following exception during processing and the thread is going to shut down: (org.apache.kafka.streams.processor.internals.StreamThread) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found. at org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99) at org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651) at org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631) at org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209) at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725) {code} The root cause is that we accidentally cleanup the partition group map so that next time we reboot the task it would miss input partitions. By avoiding clean up the partition group, we may have a slight overhead for GC which is ok. In terms of correctness, currently there is no way to revive the task with partitions reassigned. was: We detected an issue with a corrupted task failed to revive: {code:java} [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57
[jira] [Updated] (KAFKA-9607) Should not clear partition queue during task close
[ https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9607: --- Summary: Should not clear partition queue during task close (was: Should not clear partition queue or throw illegal state exception during task revocation) > Should not clear partition queue during task close > -- > > Key: KAFKA-9607 > URL: https://issues.apache.org/jira/browse/KAFKA-9607 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > We detected an issue with a corrupted task failed to revive: > {code:java} > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,137] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle > new assignment with: > New active tasks: [0_0, 3_1] > New standby tasks: [] > Existing active tasks: [0_0] > Existing standby tasks: [] > (org.apache.kafka.streams.processor.internals.TaskManager) > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,138] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > [Consumer > clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, > groupId=stream-soak-test] Adding newly assigned partitions: > k8sName-id-repartition-1 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,138] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State > transition from RUNNING to PARTITIONS_ASSIGNED > (org.apache.kafka.streams.processor.internals.StreamThread) > [2020-02-25T08:23:39-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,419] WARN > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException > fetching records from restore consumer for partitions > [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it > is likely that the consumer's position has fallen out of the topic partition > offset range because the topic was truncated or compacted on the broker, > marking the corresponding tasks as corrupted and re-initializingit later. > (org.apache.kafka.streams.processor.internals.StoreChangelogReader) > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,139] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > [Consumer > clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, > groupId=stream-soak-test] Setting offset for partition > k8sName-id-repartition-1 to the committed offset > FetchPosition{offset=3592242, offsetEpoch=Optional.empty, > currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092 > (id: 1003 rack: null)], epoch=absent}} > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2020-02-25T08:23:39-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,463] ERROR > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > Encountered the following exception during processing and the thread is going > to shut down: (org.apache.kafka.streams.processor.internals.StreamThread) > [2020-02-25T08:23:39-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) > java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found. > at > org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99) > at > org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651) > at > org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631) > at > org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209) > at > org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270) > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(St
[jira] [Updated] (KAFKA-9607) Should not clear partition queue or throw illegal state exception during task revocation
[ https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9607: --- Description: We detected an issue with a corrupted task failed to revive: {code:java} [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,137] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle new assignment with: New active tasks: [0_0, 3_1] New standby tasks: [] Existing active tasks: [0_0] Existing standby tasks: [] (org.apache.kafka.streams.processor.internals.TaskManager) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,138] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] [Consumer clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, groupId=stream-soak-test] Adding newly assigned partitions: k8sName-id-repartition-1 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,138] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State transition from RUNNING to PARTITIONS_ASSIGNED (org.apache.kafka.streams.processor.internals.StreamThread) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,419] WARN [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException fetching records from restore consumer for partitions [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it is likely that the consumer's position has fallen out of the topic partition offset range because the topic was truncated or compacted on the broker, marking the corresponding tasks as corrupted and re-initializingit later. (org.apache.kafka.streams.processor.internals.StoreChangelogReader) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,139] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] [Consumer clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, groupId=stream-soak-test] Setting offset for partition k8sName-id-repartition-1 to the committed offset FetchPosition{offset=3592242, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092 (id: 1003 rack: null)], epoch=absent}} (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,463] ERROR [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Encountered the following exception during processing and the thread is going to shut down: (org.apache.kafka.streams.processor.internals.StreamThread) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found. at org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99) at org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651) at org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631) at org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209) at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725) {code} The root cause is that we accidentally cleanup the partition group map so that next time we reboot the task it would miss input partitions. was: 1. We detected an issue with a corrupted task failed to revive: {code:java} [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,137] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32
[jira] [Created] (KAFKA-9610) Should not throw illegal state exception during task revocation
Boyang Chen created KAFKA-9610: -- Summary: Should not throw illegal state exception during task revocation Key: KAFKA-9610 URL: https://issues.apache.org/jira/browse/KAFKA-9610 Project: Kafka Issue Type: Bug Reporter: Boyang Chen Assignee: Boyang Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9610) Should not throw illegal state exception during task revocation
[ https://issues.apache.org/jira/browse/KAFKA-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9610: --- Description: In handleRevocation call, the remaining partitions could cause an illegal state exception on task revocation. This should also be fixed as it is expected when the tasks are cleared from the assignor onAssignment callback. > Should not throw illegal state exception during task revocation > --- > > Key: KAFKA-9610 > URL: https://issues.apache.org/jira/browse/KAFKA-9610 > Project: Kafka > Issue Type: Bug >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > In handleRevocation call, the remaining partitions could cause an illegal > state exception on task revocation. This should also be fixed as it is > expected when the tasks are cleared from the assignor onAssignment callback. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-7906) Improve failed leader election logging
[ https://issues.apache.org/jira/browse/KAFKA-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Agam Brahma reassigned KAFKA-7906: -- Assignee: Agam Brahma > Improve failed leader election logging > -- > > Key: KAFKA-7906 > URL: https://issues.apache.org/jira/browse/KAFKA-7906 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Agam Brahma >Priority: Major > > We often see annoying log messages like the following in the state change log: > {code} > [2019-02-05 00:02:51,307] ERROR [Controller id=13 epoch=14] Controller 13 > epoch 14 failed to change state for partition topic-3 from OnlinePartition to > OnlinePartition > (state.change.logger) > kafka.common.StateChangeFailedException: Failed to elect leader for partition > topic-3 under strategy PreferredReplicaPartitionLeaderElectionStrategy > at > kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:328) > at > kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:326) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:326) > at > kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:254) > at > kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:175) > at > kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116) > {code} > The stack trace is not adding any value and the message doesn't explain why > the election failed. You have to read the code to figure it out. It's also > curious that you have to look in the state change log for failed leader > elections in the first place. It would be more intuitive to put these in the > controller log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9607) Should not clear partition queue or throw illegal state exception during task revocation
[ https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9607: --- Description: 1. We detected an issue with a corrupted task failed to revive: {code:java} [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,137] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle new assignment with: New active tasks: [0_0, 3_1] New standby tasks: [] Existing active tasks: [0_0] Existing standby tasks: [] (org.apache.kafka.streams.processor.internals.TaskManager) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,138] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] [Consumer clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, groupId=stream-soak-test] Adding newly assigned partitions: k8sName-id-repartition-1 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,138] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State transition from RUNNING to PARTITIONS_ASSIGNED (org.apache.kafka.streams.processor.internals.StreamThread) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,419] WARN [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException fetching records from restore consumer for partitions [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it is likely that the consumer's position has fallen out of the topic partition offset range because the topic was truncated or compacted on the broker, marking the corresponding tasks as corrupted and re-initializingit later. (org.apache.kafka.streams.processor.internals.StoreChangelogReader) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,139] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] [Consumer clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, groupId=stream-soak-test] Setting offset for partition k8sName-id-repartition-1 to the committed offset FetchPosition{offset=3592242, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092 (id: 1003 rack: null)], epoch=absent}} (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,463] ERROR [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Encountered the following exception during processing and the thread is going to shut down: (org.apache.kafka.streams.processor.internals.StreamThread) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found. at org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99) at org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651) at org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631) at org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209) at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725) {code} The root cause is that we accidentally cleanup the partition group map so that next time we reboot the task it would miss input partitions. 2. Another issue tracked in this ticket is that the remaining partitions could cause an illegal state exception on task revocation. This should also be fixed as it is expected when the tasks are cleared from the assignor onAssignment callback. was: We detected an issue with a corrupted task failed to revive: {code:java} [2020-0
[jira] [Created] (KAFKA-9609) Memory Leak in Kafka Producer
Satish created KAFKA-9609: - Summary: Memory Leak in Kafka Producer Key: KAFKA-9609 URL: https://issues.apache.org/jira/browse/KAFKA-9609 Project: Kafka Issue Type: Bug Components: clients, producer Affects Versions: 2.4.0 Reporter: Satish org.apache.kafka.clients.producer.internals.Sender adds Topic Metrics for every topic that we are writing messages to but it never been cleaned up until we close the producer. This is an issue if we use single producer and have more number of Dynamic topics (eg: ~ 500 topics per hour) and writing messages to them. As this Metrics map is getting accumulated for every topic, over a period of time we notice the memory usage gets increased gradually. It can be easily reproducible by writing messages to the more # of dynamic topics using the same KafkaProducer from apache kafka client libraries or KafkaTemplate from Spring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9607) Should not clear partition queue or throw illegal state exception during task revocation
[ https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9607: --- Summary: Should not clear partition queue or throw illegal state exception during task revocation (was: Should not clear partition group if the task will be revived again) > Should not clear partition queue or throw illegal state exception during task > revocation > > > Key: KAFKA-9607 > URL: https://issues.apache.org/jira/browse/KAFKA-9607 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > We detected an issue with a corrupted task failed to revive: > {code:java} > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,137] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle > new assignment with: > New active tasks: [0_0, 3_1] > New standby tasks: [] > Existing active tasks: [0_0] > Existing standby tasks: [] > (org.apache.kafka.streams.processor.internals.TaskManager) > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,138] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > [Consumer > clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, > groupId=stream-soak-test] Adding newly assigned partitions: > k8sName-id-repartition-1 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,138] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State > transition from RUNNING to PARTITIONS_ASSIGNED > (org.apache.kafka.streams.processor.internals.StreamThread) > [2020-02-25T08:23:39-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,419] WARN > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException > fetching records from restore consumer for partitions > [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it > is likely that the consumer's position has fallen out of the topic partition > offset range because the topic was truncated or compacted on the broker, > marking the corresponding tasks as corrupted and re-initializingit later. > (org.apache.kafka.streams.processor.internals.StoreChangelogReader) > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,139] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > [Consumer > clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, > groupId=stream-soak-test] Setting offset for partition > k8sName-id-repartition-1 to the committed offset > FetchPosition{offset=3592242, offsetEpoch=Optional.empty, > currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092 > (id: 1003 rack: null)], epoch=absent}} > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2020-02-25T08:23:39-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,463] ERROR > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > Encountered the following exception during processing and the thread is going > to shut down: (org.apache.kafka.streams.processor.internals.StreamThread) > [2020-02-25T08:23:39-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) > java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found. > at > org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99) > at > org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651) > at > org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631) > at > org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209) > at > org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager
[jira] [Assigned] (KAFKA-9347) Detect deleted log directory before becoming leader
[ https://issues.apache.org/jira/browse/KAFKA-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Agam Brahma reassigned KAFKA-9347: -- Assignee: (was: Agam Brahma) > Detect deleted log directory before becoming leader > --- > > Key: KAFKA-9347 > URL: https://issues.apache.org/jira/browse/KAFKA-9347 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Priority: Major > Labels: needs-discussion > > There is no protection currently if a broker has had its log directory > deleted to prevent it from becoming the leader of a partition that it still > remains in the ISR of. This scenario can happen when the last remaining > replica in the ISR is shutdown. It will remain in the ISR and be eligible for > leadership as soon as it starts up. It would be useful to either detect this > case situation dynamically in order to force the user to do an unclean > election or recover another broker. One option might be just to pass a flag > on startup to specify that a broker should not be eligible for leadership. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-8713) [Connect] JsonConverter NULL Values are replaced by default values even in NULLABLE fields
[ https://issues.apache.org/jira/browse/KAFKA-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch updated KAFKA-8713: - Labels: needs-kip (was: ) > [Connect] JsonConverter NULL Values are replaced by default values even in > NULLABLE fields > -- > > Key: KAFKA-8713 > URL: https://issues.apache.org/jira/browse/KAFKA-8713 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.3.0, 2.2.1 >Reporter: Cheng Pan >Priority: Major > Labels: needs-kip > > Class JsonConverter line: 582 > {code:java} > private static JsonNode convertToJson(Schema schema, Object logicalValue) > { > if (logicalValue == null) { > if (schema == null) // Any schema is valid and we don't have a > default, so treat this as an optional schema > return null; > if (schema.defaultValue() != null) > return convertToJson(schema, schema.defaultValue()); > if (schema.isOptional()) > return JsonNodeFactory.instance.nullNode(); > throw new DataException("Conversion error: null value for field > that is required and has no default value"); > } > > } > {code} > h1.Expect: > Value `null` is valid for an optional filed, even though the filed has a > default value. > Only when field is required, the converter return default value fallback > when value is `null`. > h1.Actual: > Always return default value if `null` was given. > h1. Example: > I'm not sure if the current behavior is the exactly expected, but at least on > MySQL, a table define as > {code:sql} > create table t1 { >name varchar(40) not null, >create_time datetime default '1999-01-01 11:11:11' null, >update_time datetime default '1999-01-01 11:11:11' null > } > {code} > Just insert a record: > {code:sql} > INSERT INTO `t1` (`name`, `update_time`) VALUES ('kafka', null); > {code} > The result is: > {code:json} > { > "name": "kafka", > "create_time": "1999-01-01 11:11:11", > "update_time": null > } > {code} > But when I use debezium pull binlog and send the record to Kafka with > JsonConverter, the result changed to: > {code:json} > { > "name": "kafka", > "create_time": "1999-01-01 11:11:11", > "update_time": "1999-01-01 11:11:11" > } > {code} > For more details, see: https://issues.jboss.org/browse/DBZ-1064 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8713) [Connect] JsonConverter NULL Values are replaced by default values even in NULLABLE fields
[ https://issues.apache.org/jira/browse/KAFKA-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044972#comment-17044972 ] Randall Hauch commented on KAFKA-8713: -- [~pan3793], I agree that the current behavior is incorrect. But correcting this would change behavior, and we'll have to worry about how this affects existing users that upgrade to a version where this might get fixed. And AK's policy is that such changes require a KIP to identify all potential compatibility concerns. For example, we should consider whether we need to introduce a configuration property on the JSON converter that enables/disables this behavior so that existing users don't see any changes unless they want to enable it. > [Connect] JsonConverter NULL Values are replaced by default values even in > NULLABLE fields > -- > > Key: KAFKA-8713 > URL: https://issues.apache.org/jira/browse/KAFKA-8713 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.3.0, 2.2.1 >Reporter: Cheng Pan >Priority: Major > > Class JsonConverter line: 582 > {code:java} > private static JsonNode convertToJson(Schema schema, Object logicalValue) > { > if (logicalValue == null) { > if (schema == null) // Any schema is valid and we don't have a > default, so treat this as an optional schema > return null; > if (schema.defaultValue() != null) > return convertToJson(schema, schema.defaultValue()); > if (schema.isOptional()) > return JsonNodeFactory.instance.nullNode(); > throw new DataException("Conversion error: null value for field > that is required and has no default value"); > } > > } > {code} > h1.Expect: > Value `null` is valid for an optional filed, even though the filed has a > default value. > Only when field is required, the converter return default value fallback > when value is `null`. > h1.Actual: > Always return default value if `null` was given. > h1. Example: > I'm not sure if the current behavior is the exactly expected, but at least on > MySQL, a table define as > {code:sql} > create table t1 { >name varchar(40) not null, >create_time datetime default '1999-01-01 11:11:11' null, >update_time datetime default '1999-01-01 11:11:11' null > } > {code} > Just insert a record: > {code:sql} > INSERT INTO `t1` (`name`, `update_time`) VALUES ('kafka', null); > {code} > The result is: > {code:json} > { > "name": "kafka", > "create_time": "1999-01-01 11:11:11", > "update_time": null > } > {code} > But when I use debezium pull binlog and send the record to Kafka with > JsonConverter, the result changed to: > {code:json} > { > "name": "kafka", > "create_time": "1999-01-01 11:11:11", > "update_time": "1999-01-01 11:11:11" > } > {code} > For more details, see: https://issues.jboss.org/browse/DBZ-1064 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8458) Flaky Test AdminClientIntegrationTest#testElectPreferredLeaders
[ https://issues.apache.org/jira/browse/KAFKA-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044956#comment-17044956 ] Boyang Chen commented on KAFKA-8458: h3. Error Message org.junit.ComparisonFailure: expected:<... for topic partition[]> but was:<... for topic partition[.]> h3. Stacktrace org.junit.ComparisonFailure: expected:<... for topic partition[]> but was:<... for topic partition[.]> at org.junit.Assert.assertEquals(Assert.java:117) at org.junit.Assert.assertEquals(Assert.java:146) at kafka.api.PlaintextAdminIntegrationTest.testElectPreferredLeaders(PlaintextAdminIntegrationTest.scala:1288) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.lang.Thread.run(Thread.java:834) > Flaky Test AdminClientIntegrationTest#testElectPreferredLeaders > --- > > Key: KAFKA-8458 > URL: https://issues.apache.org/jira/browse/KAFKA-8458 > Project: Kafka > Issue Type: Bug > Components: admin, unit tests >Affects Versions: 2.2.0 >Reporter: Matthias J. Sax >Priority: Major > Labels: flaky-test > > Failed locally: > {code:java} > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout. > at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) > at > kafka.api.AdminClientIntegrationTest.testElectPreferredLeaders(AdminClientIntegrationTest.scala:1282) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.kafka.common.errors.TimeoutException: Aborted due to > timeout. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9596) Hanging test case `testMaxLogCompactionLag`
[ https://issues.apache.org/jira/browse/KAFKA-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044950#comment-17044950 ] Agam Brahma edited comment on KAFKA-9596 at 2/25/20 10:16 PM: -- Wasn't able to repro this failure. Passed on trunk (`5216da3d`) with 100 runs: {{$ for i in $(seq 1 100); do ./gradlew :core:cleanTest :core:test --tests "kafka.log.LogCleanerIntegrationTest.testMaxLogCompactionLag" | rg 'testMaxLogCompactionLag'; done > /tmp/testsummary.txt}} {{$ diff <(cat /tmp/testsummary.txt | wc -l) <(rg 'PASS' /tmp/testsummary.txt | wc -l) >/dev/null; echo $?}} {{0}} {{(will leave it open until this test hangs/fails again)}} was (Author: agam): Wasn't able to repro this failure. Passed on trunk (`5216da3d`) with 100 runs: {{$ for i in $(seq 1 100); do ./gradlew :core:cleanTest :core:test --tests }}{{"kafka.log.LogCleanerIntegrationTest.testMaxLogCompactionLag" | rg 'testMaxLogCompactionLag'; done > /tmp/testsummary.txt}} {{$ diff <(cat /tmp/testsummary.txt | wc -l) <(rg 'PASS' /tmp/testsummary.txt | wc -l) >/dev/null; echo $?}} {{0}} {{(will leave it open until this test hangs/fails again)}} > Hanging test case `testMaxLogCompactionLag` > --- > > Key: KAFKA-9596 > URL: https://issues.apache.org/jira/browse/KAFKA-9596 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Agam Brahma >Priority: Major > > Saw this on a recent build: > {code} > 15:18:59 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag STARTED > 18:19:25 Build timed out (after 270 minutes). Marking the build as aborted. > 18:19:25 Build was aborted > 18:19:25 [FINDBUGS] Skipping publisher since build result is ABORTED > 18:19:25 Recording test results > 18:19:25 Setting MAVEN_LATEST__HOME=/home/jenkins/tools/maven/latest/ > 18:19:25 Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3 > 18:19:27 > 18:19:27 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag SKIPPED > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9596) Hanging test case `testMaxLogCompactionLag`
[ https://issues.apache.org/jira/browse/KAFKA-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044950#comment-17044950 ] Agam Brahma commented on KAFKA-9596: Wasn't able to repro this failure. Passed on trunk (`5216da3d`) with 100 runs: {{$ for i in $(seq 1 100); do ./gradlew :core:cleanTest :core:test --tests }}{{"kafka.log.LogCleanerIntegrationTest.testMaxLogCompactionLag" | rg 'testMaxLogCompactionLag'; done > /tmp/testsummary.txt}} {{$ diff <(cat /tmp/testsummary.txt | wc -l) <(rg 'PASS' /tmp/testsummary.txt | wc -l) >/dev/null; echo $?}} {{0}} {{(will leave it open until this test hangs/fails again)}} > Hanging test case `testMaxLogCompactionLag` > --- > > Key: KAFKA-9596 > URL: https://issues.apache.org/jira/browse/KAFKA-9596 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Agam Brahma >Priority: Major > > Saw this on a recent build: > {code} > 15:18:59 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag STARTED > 18:19:25 Build timed out (after 270 minutes). Marking the build as aborted. > 18:19:25 Build was aborted > 18:19:25 [FINDBUGS] Skipping publisher since build result is ABORTED > 18:19:25 Recording test results > 18:19:25 Setting MAVEN_LATEST__HOME=/home/jenkins/tools/maven/latest/ > 18:19:25 Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3 > 18:19:27 > 18:19:27 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag SKIPPED > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9573) TestUpgrade system test failed on Java11.
[ https://issues.apache.org/jira/browse/KAFKA-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044913#comment-17044913 ] Nikolay Izhikov commented on KAFKA-9573: There was some merge conflicts after [9d53ad794de45739093070306f44df2a2f31e9e4|https://github.com/apache/kafka/commit/9d53ad794de45739093070306f44df2a2f31e9e4]. I resolve it and rerun {{upgrade_test.py}}. Results still the same(2 fails) {noformat} SESSION REPORT (ALL TESTS) ducktape version: 0.7.6 session_id: 2020-02-25--001 run time: 120 minutes 48.504 seconds tests run:31 passed: 29 failed: 2 ignored: 0 test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.8.2.2.to_message_format_version=None.compression_types=.none status: PASS run time: 5 minutes 52.846 seconds test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.8.2.2.to_message_format_version=None.compression_types=.snappy status: PASS run time: 3 minutes 42.862 seconds test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=None.security_protocol=SASL_SSL.compression_types=.none status: PASS run time: 5 minutes 48.920 seconds test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.0.1.to_message_format_version=None.compression_types=.lz4 status: PASS run time: 3 minutes 52.226 seconds test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.0.1.to_message_format_version=None.compression_types=.snappy status: PASS run time: 3 minutes 55.741 seconds test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.1.1.to_message_format_version=None.compression_types=.lz4 status: PASS run time: 3 minutes 56.117 seconds test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.1.1.to_message_format_version=None.compression_types=.snappy status: PASS run time: 4 minutes 0.577 seconds test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.2.2.to_message_format_version=0.10.2.2.compression_types=.snappy status: PASS run time: 3 minutes 52.603 seconds test_id: kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.2.2.to_message_format_version=0.9.0.1.compression_types=.none status: FAIL run time: 1 minute 20.193 seconds Kafka server didn't finish startup in 60 seconds Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 132, in run data = self.run_test() File "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 189, in run_test return self.test_context.function(self.test) File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 428, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/opt/kafka-dev/tests/kafkatest/tests/core/upgrade_test.py", line 133, in test_upgrade self.kafka.start() File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 250, in start Service.start(self) File "/usr/local/lib/python2.7/dist-packages/ducktape/services/service.py", line 234, in start self.start_node(node) File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 372, in start_node err_msg="Kafka server didn't finish startup in %d seconds" % timeout_sec) File "/usr/local/lib/python2.7/dist-packages/ducktape/cluster/remoteaccount.py", line 705, in wait_until allow_fail=True) == 0, **kwargs) File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 41, in wait_until raise TimeoutError(err_msg() if callable(err_msg) e
[jira] [Updated] (KAFKA-9047) AdminClient group operations may not respect backoff
[ https://issues.apache.org/jira/browse/KAFKA-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjana Kaundinya updated KAFKA-9047: - Reviewer: Jason Gustafson > AdminClient group operations may not respect backoff > > > Key: KAFKA-9047 > URL: https://issues.apache.org/jira/browse/KAFKA-9047 > Project: Kafka > Issue Type: Bug > Components: admin >Reporter: Jason Gustafson >Assignee: Sanjana Kaundinya >Priority: Major > > The retry logic for consumer group operations in the admin client is > complicated by the need to find the coordinator. Instead of simply retry > loops which send the same request over and over, we can get more complex > retry loops like the following: > # Send FindCoordinator to B -> Coordinator is A > # Send DescribeGroup to A -> NOT_COORDINATOR > # Go back to 1 > Currently we construct a new Call object for each step in this loop, which > means we lose some of retry bookkeeping such as the last retry time and the > number of tries. This means it is possible to have tight retry loops which > bounce between steps 1 and 2 and do not respect the retry backoff. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9608) An EOS model simulation test
Boyang Chen created KAFKA-9608: -- Summary: An EOS model simulation test Key: KAFKA-9608 URL: https://issues.apache.org/jira/browse/KAFKA-9608 Project: Kafka Issue Type: Improvement Reporter: Boyang Chen Assignee: Boyang Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9118) LogDirFailureHandler shouldn't use Zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044899#comment-17044899 ] Boyang Chen commented on KAFKA-9118: [~viktorsomogyi] Hey Viktor, do you have time to make progress on this ticket? If not, I could take a look. > LogDirFailureHandler shouldn't use Zookeeper > > > Key: KAFKA-9118 > URL: https://issues.apache.org/jira/browse/KAFKA-9118 > Project: Kafka > Issue Type: Sub-task >Reporter: Viktor Somogyi-Vass >Assignee: Viktor Somogyi-Vass >Priority: Major > > As described in > [KIP-112|https://cwiki.apache.org/confluence/display/KAFKA/KIP-112%3A+Handle+disk+failure+for+JBOD#KIP-112:HandlediskfailureforJBOD-Zookeeper]: > {noformat} > 2. A log directory stops working on a broker during runtime > - The controller watches the path /log_dir_event_notification for new znode. > - The broker detects offline log directories during runtime. > - The broker takes actions as if it has received StopReplicaRequest for this > replica. More specifically, the replica is no longer considered leader and is > removed from any replica fetcher thread. (The clients will receive a > UnknownTopicOrPartitionException at this point) > - The broker notifies the controller by creating a sequential znode under > path /log_dir_event_notification with data of the format {"version" : 1, > "broker" : brokerId, "event" : LogDirFailure}. > - The controller reads the znode to get the brokerId and finds that the event > type is LogDirFailure. > - The controller deletes the notification znode > - The controller sends LeaderAndIsrRequest to that broker to query the state > of all topic partitions on the broker. The LeaderAndIsrResponse from this > broker will specify KafkaStorageException for those partitions that are on > the bad log directories. > - The controller updates the information of offline replicas in memory and > trigger leader election as appropriate. > - The controller removes offline replicas from ISR in the ZK and sends > LeaderAndIsrRequest with updated ISR to be used by partition leaders. > - The controller propagates the information of offline replicas to brokers by > sending UpdateMetadataRequest. > {noformat} > Instead of the notification ZNode we should use a Kafka protocol that sends a > notification message to the controller with the offline partitions. The > controller then updates the information of offline replicas in memory and > trigger leader election, then removes the replicas from ISR in ZK and sends a > LAIR and an UpdateMetadataRequest. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9607) Should not clear partition group if the task will be revived again
[ https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044886#comment-17044886 ] ASF GitHub Bot commented on KAFKA-9607: --- abbccdda commented on pull request #8168: KAFKA-9607: Partition group should not be cleared if task will be revived URL: https://github.com/apache/kafka/pull/8168 This PR fixes the illegal state bug where a task gets revived but has no input partition assigned anymore. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Should not clear partition group if the task will be revived again > -- > > Key: KAFKA-9607 > URL: https://issues.apache.org/jira/browse/KAFKA-9607 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > We detected an issue with a corrupted task failed to revive: > {code:java} > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,137] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle > new assignment with: > New active tasks: [0_0, 3_1] > New standby tasks: [] > Existing active tasks: [0_0] > Existing standby tasks: [] > (org.apache.kafka.streams.processor.internals.TaskManager) > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,138] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > [Consumer > clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, > groupId=stream-soak-test] Adding newly assigned partitions: > k8sName-id-repartition-1 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,138] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State > transition from RUNNING to PARTITIONS_ASSIGNED > (org.apache.kafka.streams.processor.internals.StreamThread) > [2020-02-25T08:23:39-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,419] WARN > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException > fetching records from restore consumer for partitions > [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it > is likely that the consumer's position has fallen out of the topic partition > offset range because the topic was truncated or compacted on the broker, > marking the corresponding tasks as corrupted and re-initializingit later. > (org.apache.kafka.streams.processor.internals.StoreChangelogReader) > [2020-02-25T08:23:38-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,139] INFO > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > [Consumer > clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, > groupId=stream-soak-test] Setting offset for partition > k8sName-id-repartition-1 to the committed offset > FetchPosition{offset=3592242, offsetEpoch=Optional.empty, > currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092 > (id: 1003 rack: null)], epoch=absent}} > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2020-02-25T08:23:39-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 > 16:23:38,463] ERROR > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > stream-thread > [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] > Encountered the following exception during processing and the thread is going > to shut down: (org.apache.kafka.streams.processor.internals.StreamThread) > [2020-02-25T08:23:39-08:00] > (streams-soak-trunk_soak_i-06305ad57801079ae_stre
[jira] [Created] (KAFKA-9607) Should not clear partition group if the task will be revived again
Boyang Chen created KAFKA-9607: -- Summary: Should not clear partition group if the task will be revived again Key: KAFKA-9607 URL: https://issues.apache.org/jira/browse/KAFKA-9607 Project: Kafka Issue Type: Bug Components: streams Reporter: Boyang Chen Assignee: Boyang Chen We detected an issue with a corrupted task failed to revive: {code:java} [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,137] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle new assignment with: New active tasks: [0_0, 3_1] New standby tasks: [] Existing active tasks: [0_0] Existing standby tasks: [] (org.apache.kafka.streams.processor.internals.TaskManager) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,138] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] [Consumer clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, groupId=stream-soak-test] Adding newly assigned partitions: k8sName-id-repartition-1 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,138] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State transition from RUNNING to PARTITIONS_ASSIGNED (org.apache.kafka.streams.processor.internals.StreamThread) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,419] WARN [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException fetching records from restore consumer for partitions [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it is likely that the consumer's position has fallen out of the topic partition offset range because the topic was truncated or compacted on the broker, marking the corresponding tasks as corrupted and re-initializingit later. (org.apache.kafka.streams.processor.internals.StoreChangelogReader) [2020-02-25T08:23:38-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,139] INFO [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] [Consumer clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer, groupId=stream-soak-test] Setting offset for partition k8sName-id-repartition-1 to the committed offset FetchPosition{offset=3592242, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092 (id: 1003 rack: null)], epoch=absent}} (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 16:23:38,463] ERROR [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] stream-thread [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Encountered the following exception during processing and the thread is going to shut down: (org.apache.kafka.streams.processor.internals.StreamThread) [2020-02-25T08:23:39-08:00] (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found. at org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99) at org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651) at org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631) at org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209) at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725) {code} The root cause is that we accidentally cleanup the partition group map so that next time we reboot the task it would miss input partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9347) Detect deleted log directory before becoming leader
[ https://issues.apache.org/jira/browse/KAFKA-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Agam Brahma reassigned KAFKA-9347: -- Assignee: Agam Brahma > Detect deleted log directory before becoming leader > --- > > Key: KAFKA-9347 > URL: https://issues.apache.org/jira/browse/KAFKA-9347 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Agam Brahma >Priority: Major > Labels: needs-discussion > > There is no protection currently if a broker has had its log directory > deleted to prevent it from becoming the leader of a partition that it still > remains in the ISR of. This scenario can happen when the last remaining > replica in the ISR is shutdown. It will remain in the ISR and be eligible for > leadership as soon as it starts up. It would be useful to either detect this > case situation dynamically in order to force the user to do an unclean > election or recover another broker. One option might be just to pass a flag > on startup to specify that a broker should not be eligible for leadership. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9603) Number of open files keeps increasing in Streams application
[ https://issues.apache.org/jira/browse/KAFKA-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044774#comment-17044774 ] Sophie Blee-Goldman commented on KAFKA-9603: Hey [~biljazovic], can you tell if this happened during restoration or were there a large number of unclosed files long after it resumed processing? The reason I ask is that during restoration Streams toggles on "bulk loading" mode which involves turning off auto-compaction and dumping all the data from the changelog into (uncompacted) L0 files. These are the first level files and tend to be relatively small, so if you end up with a lot of them. Once restoration is done Streams will issue a manual compaction to merge these into a fewer number of larger files, but of course you can hit system limits before it completes if you have a lot of data to restore > Number of open files keeps increasing in Streams application > > > Key: KAFKA-9603 > URL: https://issues.apache.org/jira/browse/KAFKA-9603 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0, 2.3.1 > Environment: Spring Boot 2.2.4, OpenJDK 13, Centos image >Reporter: Bruno Iljazovic >Priority: Major > > Problem appeared when upgrading from *2.0.1* to *2.3.1*. > Relevant Kafka Streams code: > {code:java} > KStream events1 = > builder.stream(FIRST_TOPIC_NAME, Consumed.with(stringSerde, event1Serde, > event1TimestampExtractor(), null)) >.mapValues(...); > KStream events2 = > builder.stream(SECOND_TOPIC_NAME, Consumed.with(stringSerde, event2Serde, > event2TimestampExtractor(), null)) >.mapValues(...); > var joinWindows = JoinWindows.of(Duration.of(1, MINUTES).toMillis()) > .until(Duration.of(1, HOURS).toMillis()); > events2.join(events1, this::join, joinWindows, Joined.with(stringSerde, > event2Serde, event1Serde)) >.foreach(...); > {code} > Number of open *.sst files keeps increasing until eventually it hits the os > limit (65536) and causes this exception: > {code:java} > Caused by: org.rocksdb.RocksDBException: While open a file for appending: > /.../0_8/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.157943520/001354.sst: > Too many open files > at org.rocksdb.RocksDB.flush(Native Method) > at org.rocksdb.RocksDB.flush(RocksDB.java:2394) > {code} > Here are example files that are opened and never closed: > {code:java} > /.../0_27/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000114.sst > /.../0_27/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.158245920/65.sst > /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158215680/000115.sst > /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000112.sst > /.../0_31/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158185440/51.sst > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-8147) Add changelog topic configuration to KTable suppress
[ https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Roesler resolved KAFKA-8147. - Resolution: Fixed Merged as https://github.com/apache/kafka/commit/90640266393b530107db8256d38ec5aeba4805e1 > Add changelog topic configuration to KTable suppress > > > Key: KAFKA-8147 > URL: https://issues.apache.org/jira/browse/KAFKA-8147 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 2.1.1 >Reporter: Maarten >Assignee: highluck >Priority: Minor > Labels: kip > Fix For: 2.6.0 > > > The streams DSL does not provide a way to configure the changelog topic > created by KTable.suppress. > From the perspective of an external user this could be implemented similar to > the configuration of aggregate + materialized, i.e., > {code:java} > changelogTopicConfigs = // Configs > materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs) > .. > KGroupedStream.aggregate(..,materialized) > {code} > [KIP-446: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress|https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8147) Add changelog topic configuration to KTable suppress
[ https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044765#comment-17044765 ] ASF GitHub Bot commented on KAFKA-8147: --- vvcephei commented on pull request #8029: KAFKA-8147: Add changelog topic configuration to KTable suppress URL: https://github.com/apache/kafka/pull/8029 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add changelog topic configuration to KTable suppress > > > Key: KAFKA-8147 > URL: https://issues.apache.org/jira/browse/KAFKA-8147 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 2.1.1 >Reporter: Maarten >Assignee: highluck >Priority: Minor > Labels: kip > > The streams DSL does not provide a way to configure the changelog topic > created by KTable.suppress. > From the perspective of an external user this could be implemented similar to > the configuration of aggregate + materialized, i.e., > {code:java} > changelogTopicConfigs = // Configs > materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs) > .. > KGroupedStream.aggregate(..,materialized) > {code} > [KIP-446: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress|https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-8147) Add changelog topic configuration to KTable suppress
[ https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Roesler updated KAFKA-8147: Fix Version/s: 2.6.0 > Add changelog topic configuration to KTable suppress > > > Key: KAFKA-8147 > URL: https://issues.apache.org/jira/browse/KAFKA-8147 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 2.1.1 >Reporter: Maarten >Assignee: highluck >Priority: Minor > Labels: kip > Fix For: 2.6.0 > > > The streams DSL does not provide a way to configure the changelog topic > created by KTable.suppress. > From the perspective of an external user this could be implemented similar to > the configuration of aggregate + materialized, i.e., > {code:java} > changelogTopicConfigs = // Configs > materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs) > .. > KGroupedStream.aggregate(..,materialized) > {code} > [KIP-446: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress|https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9605) EOS Producer could throw illegal state if trying to complete a failed batch after fatal error
[ https://issues.apache.org/jira/browse/KAFKA-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9605: --- Description: In the Producer we could see network client hits fatal exception while trying to complete the batches after Txn manager hits fatal fenced error: {code:java} [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,673] ERROR [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, transactionalId=stream-soak-test-1_0] Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,674] INFO [stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-0_0-producer, transactionalId=stream-soak-test-0_0] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,684] INFO [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, transactionalId=stream-soak-test-1_0] Resetting sequence number of batch with current sequence 354277 for partition windowed-node-counts-0 to 354276 (org.apache.kafka.clients.producer.internals.TransactionManager) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,684] INFO [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] Resetting sequence number of batch with current sequence 354277 for partition windowed-node-counts-0 to 354276 (org.apache.kafka.clients.producer.internals.ProducerBatch) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,685] ERROR [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, transactionalId=stream-soak-test-1_0] Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) java.lang.IllegalStateException: Should not reopen a batch which is already aborted. at org.apache.kafka.common.record.MemoryRecordsBuilder.reopenAndRewriteProducerState(MemoryRecordsBuilder.java:295) at org.apache.kafka.clients.producer.internals.ProducerBatch.resetProducerState(ProducerBatch.java:395) at org.apache.kafka.clients.producer.internals.TransactionManager.lambda$adjustSequencesDueToFailedBatch$4(TransactionManager.java:770) at org.apache.kafka.clients.producer.internals.TransactionManager$TopicPartitionEntry.resetSequenceNumbers(TransactionManager.java:180) at org.apache.kafka.clients.producer.internals.TransactionManager.adjustSequencesDueToFailedBatch(TransactionManager.java:760) at org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:735) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:671) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:662) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:620) at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:554) at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:69) at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:745) at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:571) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:563) at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:304) at org.apache.kafka.clients.producer.interna
[jira] [Assigned] (KAFKA-9129) Add Thread ID to the InternalProcessorContext
[ https://issues.apache.org/jira/browse/KAFKA-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Cadonna reassigned KAFKA-9129: Assignee: Bruno Cadonna > Add Thread ID to the InternalProcessorContext > - > > Key: KAFKA-9129 > URL: https://issues.apache.org/jira/browse/KAFKA-9129 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bruno Cadonna >Assignee: Bruno Cadonna >Priority: Major > > When we added client metrics we had to move the {{StreamsMetricsImpl}} object > to the client level. That means that now instead of having one > {{StreamsMetricsImpl}} object per thread, we have now one per client. That > also means that we cannot store the thread ID in the {{StreamsMetricsImpl}} > anymore. Currently, we get the thread ID from > {{Thread.currentThread().getName()}} when we need to create a sensor. > However, that is not robust against code refactoring because we need to > ensure that the thread that creates the sensor is also the one that records > the metrics. To be more flexible, we should expose the ID of the thread that > executes a processor in the {{InternalProcessorContext}} like it already > exposes the task ID. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9606) Document Metrics Changes from KIP-444
Bruno Cadonna created KAFKA-9606: Summary: Document Metrics Changes from KIP-444 Key: KAFKA-9606 URL: https://issues.apache.org/jira/browse/KAFKA-9606 Project: Kafka Issue Type: Task Components: docs, streams Affects Versions: 2.5.0 Reporter: Bruno Cadonna Fix For: 2.5.0 Changes introduced in KIP-444 shall be documented. See [https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%253A+Augment+metrics+for+Kafka+Streams] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9606) Document Metrics Changes from KIP-444
[ https://issues.apache.org/jira/browse/KAFKA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Cadonna reassigned KAFKA-9606: Assignee: Bruno Cadonna > Document Metrics Changes from KIP-444 > - > > Key: KAFKA-9606 > URL: https://issues.apache.org/jira/browse/KAFKA-9606 > Project: Kafka > Issue Type: Task > Components: docs, streams >Affects Versions: 2.5.0 >Reporter: Bruno Cadonna >Assignee: Bruno Cadonna >Priority: Major > Fix For: 2.5.0 > > > Changes introduced in KIP-444 shall be documented. See > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%253A+Augment+metrics+for+Kafka+Streams] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9533) ValueTransform forwards `null` values
[ https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044725#comment-17044725 ] Bill Bejeck commented on KAFKA-9533: reverted cherry-picks to 2.5, 2.4, 2.3, and 2.2 > ValueTransform forwards `null` values > - > > Key: KAFKA-9533 > URL: https://issues.apache.org/jira/browse/KAFKA-9533 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, > 2.4.0, 2.3.1 >Reporter: Michael Viamari >Assignee: Michael Viamari >Priority: Major > Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1 > > > According to the documentation for `KStream#transformValues`, nulls returned > from `ValueTransformer#transform` are not forwarded. (see > [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-] > However, this does not appear to be the case. In > `KStreamTransformValuesProcessor#process` the result of the transform is > forwarded directly. > {code:java} > @Override > public void process(final K key, final V value) { > context.forward(key, valueTransformer.transform(key, value)); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9533) ValueTransform forwards `null` values
[ https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044724#comment-17044724 ] ASF GitHub Bot commented on KAFKA-9533: --- bbejeck commented on pull request #8167: KAFKA-9533: Revert ValueTransform forwards `null` URL: https://github.com/apache/kafka/pull/8167 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ValueTransform forwards `null` values > - > > Key: KAFKA-9533 > URL: https://issues.apache.org/jira/browse/KAFKA-9533 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, > 2.4.0, 2.3.1 >Reporter: Michael Viamari >Assignee: Michael Viamari >Priority: Major > Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1 > > > According to the documentation for `KStream#transformValues`, nulls returned > from `ValueTransformer#transform` are not forwarded. (see > [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-] > However, this does not appear to be the case. In > `KStreamTransformValuesProcessor#process` the result of the transform is > forwarded directly. > {code:java} > @Override > public void process(final K key, final V value) { > context.forward(key, valueTransformer.transform(key, value)); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9533) ValueTransform forwards `null` values
[ https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044719#comment-17044719 ] Michael Viamari commented on KAFKA-9533: Ok. Thanks for the detailed explanation. I'll take a look at the changes you're proposing here as well. > ValueTransform forwards `null` values > - > > Key: KAFKA-9533 > URL: https://issues.apache.org/jira/browse/KAFKA-9533 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, > 2.4.0, 2.3.1 >Reporter: Michael Viamari >Assignee: Michael Viamari >Priority: Major > Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1 > > > According to the documentation for `KStream#transformValues`, nulls returned > from `ValueTransformer#transform` are not forwarded. (see > [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-] > However, this does not appear to be the case. In > `KStreamTransformValuesProcessor#process` the result of the transform is > forwarded directly. > {code:java} > @Override > public void process(final K key, final V value) { > context.forward(key, valueTransformer.transform(key, value)); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9605) EOS Producer could throw illegal state if trying to complete a failed batch after fatal error
[ https://issues.apache.org/jira/browse/KAFKA-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9605: --- Fix Version/s: 2.5.0 Affects Version/s: 2.5.0 2.4.0 > EOS Producer could throw illegal state if trying to complete a failed batch > after fatal error > - > > Key: KAFKA-9605 > URL: https://issues.apache.org/jira/browse/KAFKA-9605 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0, 2.5.0 >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > Fix For: 2.5.0 > > > ``` > [2020-02-24T13:23:29-08:00] > (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 > 21:23:28,673] ERROR [kafka-producer-network-thread | > stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] > [Producer > clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, > transactionalId=stream-soak-test-1_0] Aborting producer batches due to fatal > error (org.apache.kafka.clients.producer.internals.Sender) > [2020-02-24T13:23:29-08:00] > (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) > org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an > operation with an old epoch. Either there is a newer producer with the same > transactionalId, or the producer's transaction has been expired by the broker. > [2020-02-24T13:23:29-08:00] > (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 > 21:23:28,674] INFO > [stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3] > [Producer > clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-0_0-producer, > transactionalId=stream-soak-test-0_0] Closing the Kafka producer with > timeoutMillis = 9223372036854775807 ms. > (org.apache.kafka.clients.producer.KafkaProducer) > [2020-02-24T13:23:29-08:00] > (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 > 21:23:28,684] INFO [kafka-producer-network-thread | > stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] > [Producer > clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, > transactionalId=stream-soak-test-1_0] Resetting sequence number of batch > with current sequence 354277 for partition windowed-node-counts-0 to 354276 > (org.apache.kafka.clients.producer.internals.TransactionManager) > [2020-02-24T13:23:29-08:00] > (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 > 21:23:28,684] INFO [kafka-producer-network-thread | > stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] > Resetting sequence number of batch with current sequence 354277 for > partition windowed-node-counts-0 to 354276 > (org.apache.kafka.clients.producer.internals.ProducerBatch) > [2020-02-24T13:23:29-08:00] > (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 > 21:23:28,685] ERROR [kafka-producer-network-thread | > stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] > [Producer > clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, > transactionalId=stream-soak-test-1_0] Uncaught error in request completion: > (org.apache.kafka.clients.NetworkClient) > [2020-02-24T13:23:29-08:00] > (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) > java.lang.IllegalStateException: Should not reopen a batch which is already > aborted. > at > org.apache.kafka.common.record.MemoryRecordsBuilder.reopenAndRewriteProducerState(MemoryRecordsBuilder.java:295) > at > org.apache.kafka.clients.producer.internals.ProducerBatch.resetProducerState(ProducerBatch.java:395) > at > org.apache.kafka.clients.producer.internals.TransactionManager.lambda$adjustSequencesDueToFailedBatch$4(TransactionManager.java:770) > at > org.apache.kafka.clients.producer.internals.TransactionManager$TopicPartitionEntry.resetSequenceNumbers(TransactionManager.java:180) > at > org.apache.kafka.clients.producer.internals.TransactionManager.adjustSequencesDueToFailedBatch(TransactionManager.java:760) > at > org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:735) > at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:671) > at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:662) > at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:620) > at > org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:554) >
[jira] [Created] (KAFKA-9605) EOS Producer could throw illegal state if trying to complete a failed batch after fatal error
Boyang Chen created KAFKA-9605: -- Summary: EOS Producer could throw illegal state if trying to complete a failed batch after fatal error Key: KAFKA-9605 URL: https://issues.apache.org/jira/browse/KAFKA-9605 Project: Kafka Issue Type: Bug Reporter: Boyang Chen Assignee: Boyang Chen ``` [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,673] ERROR [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, transactionalId=stream-soak-test-1_0] Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,674] INFO [stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-0_0-producer, transactionalId=stream-soak-test-0_0] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,684] INFO [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, transactionalId=stream-soak-test-1_0] Resetting sequence number of batch with current sequence 354277 for partition windowed-node-counts-0 to 354276 (org.apache.kafka.clients.producer.internals.TransactionManager) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,684] INFO [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] Resetting sequence number of batch with current sequence 354277 for partition windowed-node-counts-0 to 354276 (org.apache.kafka.clients.producer.internals.ProducerBatch) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 21:23:28,685] ERROR [kafka-producer-network-thread | stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer] [Producer clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer, transactionalId=stream-soak-test-1_0] Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient) [2020-02-24T13:23:29-08:00] (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) java.lang.IllegalStateException: Should not reopen a batch which is already aborted. at org.apache.kafka.common.record.MemoryRecordsBuilder.reopenAndRewriteProducerState(MemoryRecordsBuilder.java:295) at org.apache.kafka.clients.producer.internals.ProducerBatch.resetProducerState(ProducerBatch.java:395) at org.apache.kafka.clients.producer.internals.TransactionManager.lambda$adjustSequencesDueToFailedBatch$4(TransactionManager.java:770) at org.apache.kafka.clients.producer.internals.TransactionManager$TopicPartitionEntry.resetSequenceNumbers(TransactionManager.java:180) at org.apache.kafka.clients.producer.internals.TransactionManager.adjustSequencesDueToFailedBatch(TransactionManager.java:760) at org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:735) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:671) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:662) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:620) at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:554) at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:69) at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:745) at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:571) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:563) at org.apache.kafka.clients.producer.internals.Sender.runOnce(S
[jira] [Commented] (KAFKA-9562) Streams not making progress under heavy failures with EOS enabled on 2.5 branch
[ https://issues.apache.org/jira/browse/KAFKA-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044693#comment-17044693 ] Boyang Chen commented on KAFKA-9562: We are soaking the 2.5 with/without EOS for another 2 weeks to make sure the stream side is in good shape. > Streams not making progress under heavy failures with EOS enabled on 2.5 > branch > --- > > Key: KAFKA-9562 > URL: https://issues.apache.org/jira/browse/KAFKA-9562 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.5.0 >Reporter: John Roesler >Assignee: Boyang Chen >Priority: Blocker > Fix For: 2.5.0 > > > During soak testing in preparation for the 2.5.0 release, we have discovered > a case in which Streams appears to stop making progress. Specifically, this > is a failure-resilience test in which we inject network faults separating the > instances from the brokers roughly every twenty minutes. > On 2.4, Streams would obviously spend a lot of time rebalancing under this > scenario, but would still make progress. However, on the current 2.5 branch, > Streams effectively stops making progress except rarely. > This appears to be a severe regression, so I'm filing this ticket as a 2.5.0 > release blocker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9596) Hanging test case `testMaxLogCompactionLag`
[ https://issues.apache.org/jira/browse/KAFKA-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Agam Brahma reassigned KAFKA-9596: -- Assignee: Agam Brahma > Hanging test case `testMaxLogCompactionLag` > --- > > Key: KAFKA-9596 > URL: https://issues.apache.org/jira/browse/KAFKA-9596 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Agam Brahma >Priority: Major > > Saw this on a recent build: > {code} > 15:18:59 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag STARTED > 18:19:25 Build timed out (after 270 minutes). Marking the build as aborted. > 18:19:25 Build was aborted > 18:19:25 [FINDBUGS] Skipping publisher since build result is ABORTED > 18:19:25 Recording test results > 18:19:25 Setting MAVEN_LATEST__HOME=/home/jenkins/tools/maven/latest/ > 18:19:25 Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3 > 18:19:27 > 18:19:27 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag SKIPPED > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7787) Add error specifications to KAFKA-7609
[ https://issues.apache.org/jira/browse/KAFKA-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044630#comment-17044630 ] Tom Bentley commented on KAFKA-7787: [~cmccabe] any feedback on my comment? > Add error specifications to KAFKA-7609 > -- > > Key: KAFKA-7787 > URL: https://issues.apache.org/jira/browse/KAFKA-7787 > Project: Kafka > Issue Type: Sub-task >Reporter: Colin McCabe >Assignee: Tom Bentley >Priority: Minor > > In our RPC JSON, it would be nice if we could specify what versions of a > response could contain what errors. See the discussion here: > https://github.com/apache/kafka/pull/5893#discussion_r244841051 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9308) Misses SAN after certificate creation
[ https://issues.apache.org/jira/browse/KAFKA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mickael Maison resolved KAFKA-9308. --- Fix Version/s: 2.6.0 Resolution: Fixed > Misses SAN after certificate creation > - > > Key: KAFKA-9308 > URL: https://issues.apache.org/jira/browse/KAFKA-9308 > Project: Kafka > Issue Type: Bug > Components: documentation >Affects Versions: 2.3.1 >Reporter: Agostino Sarubbo >Assignee: Sönke Liebau >Priority: Minor > Fix For: 2.6.0 > > > Hello, > I followed the documentation to use kafka with ssl, however the entire > 'procedure' loses at the end the specified SAN. > To test, run (after the first keytool command and after the latest): > > {code:java} > keytool -list -v -keystore server.keystore.jks > {code} > Reference: > [http://kafka.apache.org/documentation.html#security_ssl] > > {code:java} > #!/bin/bash > #Step 1 > keytool -keystore server.keystore.jks -alias localhost -validity 365 -keyalg > RSA -genkey -ext SAN=DNS:test.test.com > #Step 2 > openssl req -new -x509 -keyout ca-key -out ca-cert -days 365 > keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert > keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert > #Step 3 > keytool -keystore server.keystore.jks -alias localhost -certreq -file > cert-file > openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed > -days 365 -CAcreateserial -passin pass:test1234 > keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert > keytool -keystore server.keystore.jks -alias localhost -import -file > cert-signed > {code} > > In the detail, the SAN is losed after: > {code:java} > keytool -keystore server.keystore.jks -alias localhost -import -file > cert-signed > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9308) Misses SAN after certificate creation
[ https://issues.apache.org/jira/browse/KAFKA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mickael Maison reassigned KAFKA-9308: - Assignee: Sönke Liebau > Misses SAN after certificate creation > - > > Key: KAFKA-9308 > URL: https://issues.apache.org/jira/browse/KAFKA-9308 > Project: Kafka > Issue Type: Bug > Components: documentation >Affects Versions: 2.3.1 >Reporter: Agostino Sarubbo >Assignee: Sönke Liebau >Priority: Minor > > Hello, > I followed the documentation to use kafka with ssl, however the entire > 'procedure' loses at the end the specified SAN. > To test, run (after the first keytool command and after the latest): > > {code:java} > keytool -list -v -keystore server.keystore.jks > {code} > Reference: > [http://kafka.apache.org/documentation.html#security_ssl] > > {code:java} > #!/bin/bash > #Step 1 > keytool -keystore server.keystore.jks -alias localhost -validity 365 -keyalg > RSA -genkey -ext SAN=DNS:test.test.com > #Step 2 > openssl req -new -x509 -keyout ca-key -out ca-cert -days 365 > keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert > keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert > #Step 3 > keytool -keystore server.keystore.jks -alias localhost -certreq -file > cert-file > openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed > -days 365 -CAcreateserial -passin pass:test1234 > keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert > keytool -keystore server.keystore.jks -alias localhost -import -file > cert-signed > {code} > > In the detail, the SAN is losed after: > {code:java} > keytool -keystore server.keystore.jks -alias localhost -import -file > cert-signed > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9066) Kafka Connect JMX : source & sink task metrics missing for tasks in failed state
[ https://issues.apache.org/jira/browse/KAFKA-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044624#comment-17044624 ] Simon Trigona commented on KAFKA-9066: -- We're also experiencing this. > Kafka Connect JMX : source & sink task metrics missing for tasks in failed > state > > > Key: KAFKA-9066 > URL: https://issues.apache.org/jira/browse/KAFKA-9066 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.1.1 >Reporter: Mikołaj Stefaniak >Priority: Major > > h2. Overview > Kafka Connect exposes various metrics via JMX. Those metrics can be exported > i.e. by _Prometheus JMX Exporter_ for further processing. > One of crucial attributes is connector's *task status.* > According to official Kafka docs, status is available as +status+ attribute > of following MBean: > {quote}kafka.connect:type=connector-task-metrics,connector="\{connector}",task="\{task}"status > - The status of the connector task. One of 'unassigned', 'running', > 'paused', 'failed', or 'destroyed'. > {quote} > h2. Issue > Generally +connector-task-metrics+ are exposed propery for tasks in +running+ > status but not exposed at all if task is +failed+. > Failed Task *appears* properly with failed status when queried via *REST API*: > > {code:java} > $ curl -X GET -u 'user:pass' > http://kafka-connect.mydomain.com/connectors/customerconnector/status > {"name":"customerconnector","connector":{"state":"RUNNING","worker_id":"kafka-connect.mydomain.com:8080"},"tasks":[{"id":0,"state":"FAILED","worker_id":"kafka-connect.mydomain.com:8080","trace":"org.apache.kafka.connect.errors.ConnectException: > Received DML 'DELETE FROM mysql.rds_sysinfo .."}],"type":"source"} > $ {code} > > Failed Task *doesn't appear* as bean with +connector-task-metrics+ type when > queried via *JMX*: > > {code:java} > $ echo "beans -d kafka.connect" | java -jar > target/jmxterm-1.1.0-SNAPSHOT-uber.jar -l localhost:8081 -n -v silent | grep > connector=customerconnector > kafka.connect:connector=customerconnector,task=0,type=task-error-metricskafka.connect:connector=customerconnector,type=connector-metrics > $ > {code} > h2. Expected result > It is expected, that bean with +connector-task-metrics+ type will appear also > for tasks that failed. > Below is example of how beans are properly registered for tasks in Running > state: > > {code:java} > $ echo "get -b > kafka.connect:connector=sinkConsentSubscription-1000,task=0,type=connector-task-metrics > status" | java -jar target/jmxterm-1.1.0-SNAPSHOT-ube r.jar -l > localhost:8081 -n -v silent > status = running; > $ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KAFKA-9533) ValueTransform forwards `null` values
[ https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Roesler reopened KAFKA-9533: - > ValueTransform forwards `null` values > - > > Key: KAFKA-9533 > URL: https://issues.apache.org/jira/browse/KAFKA-9533 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, > 2.4.0, 2.3.1 >Reporter: Michael Viamari >Assignee: Michael Viamari >Priority: Major > Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1 > > > According to the documentation for `KStream#transformValues`, nulls returned > from `ValueTransformer#transform` are not forwarded. (see > [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-] > However, this does not appear to be the case. In > `KStreamTransformValuesProcessor#process` the result of the transform is > forwarded directly. > {code:java} > @Override > public void process(final K key, final V value) { > context.forward(key, valueTransformer.transform(key, value)); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9533) ValueTransform forwards `null` values
[ https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044602#comment-17044602 ] John Roesler commented on KAFKA-9533: - Hi [~mviamari] , Since merging this fix into trunk, we've discovered some use cases that actually depend on the prior behavior. In retrospect, I think that we should fix this issue by correcting the docs, not changing the behavior. I bet this started with a simple copy/paste error from the {{transform}} API, in which returning {{null}} (as opposed to a {{KeyValue}} does mean to drop the record). But in that case, it makes sense, since we cannot process a record with no key. The choice is between throwing an NPE and dropping the record, so we drop the record. But this doesn't apply to {{transformValues}}, because the result would have the same key as the input. Really, it's murky territory either way... Since we have {{transorm}} and {{flatTransform}}, as well as the {{xValues}} overloads, it implies that {{transform}} is one-in-to-one-out and {{flatTransform}} is one-in-to-zero-or-more-out. In that case, if you did want to drop an input record, you should use a {{flatTransform}} and return an empty collection. There _should_ be an invariant for {{transform}} and {{transformValues}} that you get exactly the same number of output records as input records, whether they are {{null}} or otherwise. Since it seems like the docs, not the behavior, are wrong, and since this is pre-existing behavior going back a long way in Streams (which breaks some users' code to change), we should go ahead and back out the PR. Sorry for the confusion. As far as the actual fix goes, I'd be in favor of amending the docs to state that: 1. When {{KStream#transformValues}} returns {{null}}, there will simply be an output record with a {{null}} value. 2. When {{KStream#transform}} returns {{null}}, we consider that to be an invalid return, and we will log a warning while dropping the record (similar to other APIs in Streams) 3. When {{KStream#transform}} returns a {{KeyValue(key, null)}}, there will simply be an output record with a null value. 4. When {{KStream#flatTransform}} and {{KStream#flatTransformValues}} return {{null}}, we consider that to be equivalent to returning an empty collection, in which case, we don't forward any results, and also do not log a warning. (I'd also be in favor of logging a warning here for consistency, but it seems like overkill). Accordingly, I'm going to re-open this ticket, and Bill will revert the changes to fix the downstream builds. Sorry again for the trouble. -John > ValueTransform forwards `null` values > - > > Key: KAFKA-9533 > URL: https://issues.apache.org/jira/browse/KAFKA-9533 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, > 2.4.0, 2.3.1 >Reporter: Michael Viamari >Assignee: Michael Viamari >Priority: Major > Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1 > > > According to the documentation for `KStream#transformValues`, nulls returned > from `ValueTransformer#transform` are not forwarded. (see > [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-] > However, this does not appear to be the case. In > `KStreamTransformValuesProcessor#process` the result of the transform is > forwarded directly. > {code:java} > @Override > public void process(final K key, final V value) { > context.forward(key, valueTransformer.transform(key, value)); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9567) Docs and system tests for ZooKeeper 3.5.7 and KIP-515
[ https://issues.apache.org/jira/browse/KAFKA-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-9567. -- Fix Version/s: 2.5.0 Resolution: Fixed Issue resolved by pull request 8132 [https://github.com/apache/kafka/pull/8132] > Docs and system tests for ZooKeeper 3.5.7 and KIP-515 > - > > Key: KAFKA-9567 > URL: https://issues.apache.org/jira/browse/KAFKA-9567 > Project: Kafka > Issue Type: Improvement >Affects Versions: 2.5.0 >Reporter: Ron Dagostino >Priority: Blocker > Fix For: 2.5.0 > > > These changes depend on [KIP-515: Enable ZK client to use the new TLS > supported > authentication|https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Enable+ZK+client+to+use+the+new+TLS+supported+authentication], > which was only added to 2.5.0. The upgrade to ZooKeeper 3.5.7 was merged to > both 2.5.0 and 2.4.1 via https://issues.apache.org/jira/browse/KAFKA-9515, > but this change must only be merged to 2.5.0 (it will break the system tests > if merged to 2.4.1). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9567) Docs and system tests for ZooKeeper 3.5.7 and KIP-515
[ https://issues.apache.org/jira/browse/KAFKA-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044511#comment-17044511 ] ASF GitHub Bot commented on KAFKA-9567: --- omkreddy commented on pull request #8132: KAFKA-9567: Docs, system tests for ZooKeeper 3.5.7 URL: https://github.com/apache/kafka/pull/8132 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Docs and system tests for ZooKeeper 3.5.7 and KIP-515 > - > > Key: KAFKA-9567 > URL: https://issues.apache.org/jira/browse/KAFKA-9567 > Project: Kafka > Issue Type: Improvement >Affects Versions: 2.5.0 >Reporter: Ron Dagostino >Priority: Blocker > > These changes depend on [KIP-515: Enable ZK client to use the new TLS > supported > authentication|https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Enable+ZK+client+to+use+the+new+TLS+supported+authentication], > which was only added to 2.5.0. The upgrade to ZooKeeper 3.5.7 was merged to > both 2.5.0 and 2.4.1 via https://issues.apache.org/jira/browse/KAFKA-9515, > but this change must only be merged to 2.5.0 (it will break the system tests > if merged to 2.4.1). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9604) Падение кластера
[ https://issues.apache.org/jira/browse/KAFKA-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Larionov updated KAFKA-9604: --- Description: Добрый день! На одном из серверов в кластере произошло переполнение дискового пространства. При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в log.dirs. При достижении retention time сработала очистка и физический файл 07607076.log не был найден. Брокер аварийно остановился. [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base offsets [7607076] due to retention time 60480ms breach (kafka.log.Log) [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 7607076, size 131228281] for deletion. (kafka.log.Log) [2020-02-06 13:32:48,979] ERROR Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.LogDirFailureChannel) java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted [2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving replicas in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.ReplicaManager) [2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler) org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data Caused by: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted ... [2020-02-06 13:32:49,058] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Затем аварийно остановились все остальные ноды кластера на выборах лидеров партиций: [2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making broker the leader for partition Topic: ocs.counter-balances; Partition: 40; Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager) org.apache.kafka.common.errors.KafkaStorageException: Error while writing to checkpoint file /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint Caused by: java.io.FileNotFoundException: /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp (No such file or directory) не смогли переписать leader-epoch-checkpoint и остановились по этой причине [2020-02-06 13:32:53,687] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Является ли эта ситуация нормой? was: Добрый день! На одном из серверов в кластере произошло переполнение дискового пространства. При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в log.dirs. При достижении retention time сработала очистка и физический файл 07607076.log не был найден. Брокер аварийно остановился. [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base offsets [7607076] due to retention time 60480ms breach (kafka.log.Log) [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 7607076, size 131228281] for deletion. (kafka.log.Log) [2020-02-06 13:32:48,979] ERROR Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.LogDirFailureChannel) java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted [2020-02-06 13:32:48,982] INFO [ReplicaMana
[jira] [Updated] (KAFKA-9604) Падение кластера
[ https://issues.apache.org/jira/browse/KAFKA-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Larionov updated KAFKA-9604: --- Description: Добрый день! На одном из серверов в кластере произошло переполнение дискового пространства. При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в log.dirs. При достижении retention time сработала очистка и физический файл 07607076.log не был найден. Брокер аварийно остановился. [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base offsets [7607076] due to retention time 60480ms breach (kafka.log.Log) [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 7607076, size 131228281] for deletion. (kafka.log.Log) [2020-02-06 13:32:48,979] ERROR Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.LogDirFailureChannel) java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted [2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving replicas in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.ReplicaManager) [2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler) org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data Caused by: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted ... [2020-02-06 13:32:49,058] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Затем аварийно остановились все остальные ноды кластера на выборах лидеров партиций: [2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making broker the leader for partition Topic: ocs.counter-balances; Partition: 40; Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager) org.apache.kafka.common.errors.KafkaStorageException: Error while writing to checkpoint file /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint Caused by: java.io.FileNotFoundException: /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp (No such file or directory) не смогли переписать leader-epoch-checkpoint и остановились по этой причине [2020-02-06 13:32:53,687] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Является ли эта ситуация нормой? was: Добрый день! На одном из серверов в кластере произошло переполнение дискового пространства. При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в log.dirs. При достижении retention time сработала очистка и физический файл 07607076.log не был найден. Брокер аварийно остановился. [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base offsets [7607076] due to retention time 60480ms breach (kafka.log.Log) [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 7607076, size 131228281] for deletion. (kafka.log.Log) [2020-02-06 13:32:48,979] ERROR Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.LogDirFailureChannel) java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted [2020-02-06 13:32:48,982] INFO [ReplicaManager bro
[jira] [Commented] (KAFKA-3881) Remove the replacing logic from "." to "_" in Fetcher
[ https://issues.apache.org/jira/browse/KAFKA-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044383#comment-17044383 ] ASF GitHub Bot commented on KAFKA-3881: --- tombentley commented on pull request #3407: KAFKA-3881: Remove the replacing logic from "." to "_" in Fetcher URL: https://github.com/apache/kafka/pull/3407 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove the replacing logic from "." to "_" in Fetcher > - > > Key: KAFKA-3881 > URL: https://issues.apache.org/jira/browse/KAFKA-3881 > Project: Kafka > Issue Type: Bug > Components: consumer, metrics >Reporter: Guozhang Wang >Assignee: Tom Bentley >Priority: Major > Labels: newbie, patch-available > > The logic of replacing "." to "_" in metrics names / tags was originally > introduced in the core package's metrics since Graphite treats "." as > hierarchy separators (see KAFKA-1902); for the client metrics, it is supposed > that the GraphiteReported should take care of this itself rather than letting > Kafka metrics to special handle for it. In addition, right now only consumer > Fetcher had replace, but producer Sender does not have it actually. > So we should consider removing this logic in the consumer Fetcher's metrics > package. NOTE that this is a public API backward incompatible change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-5554) Hilight config settings for particular common use cases
[ https://issues.apache.org/jira/browse/KAFKA-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Bentley resolved KAFKA-5554. Resolution: Abandoned > Hilight config settings for particular common use cases > --- > > Key: KAFKA-5554 > URL: https://issues.apache.org/jira/browse/KAFKA-5554 > Project: Kafka > Issue Type: Improvement > Components: documentation >Reporter: Tom Bentley >Assignee: Tom Bentley >Priority: Minor > > Judging by the sorts of questions seen on the mailling list, stack overflow > etc it seems common for users to assume that Kafka will default to settings > which won't lose messages. They start using Kafka and at some later time find > messages have been lost. > While it's not our fault if users don't read the documentation, there's a lot > of configuration documentation to digest and it's easy for people to miss an > important setting. > Therefore, I'd like to suggest that in addition to the current configuration > docs we add a short section highlighting those settings which pertain to > common use cases, such as: > * configs to avoid lost messages > * configs for low latency > I'm sure some users will continue to not read the documentation, but when > they inevitably start asking questions it means people can respond with "have > you configured everything as described here?" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-6359) Work for KIP-236
[ https://issues.apache.org/jira/browse/KAFKA-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Bentley resolved KAFKA-6359. Resolution: Implemented This was addressed by KIP-455 instead. > Work for KIP-236 > > > Key: KAFKA-6359 > URL: https://issues.apache.org/jira/browse/KAFKA-6359 > Project: Kafka > Issue Type: Improvement >Reporter: Tom Bentley >Assignee: GEORGE LI >Priority: Minor > > This issue is for the work described in KIP-236. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-5517) Support linking to particular configuration parameters
[ https://issues.apache.org/jira/browse/KAFKA-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044373#comment-17044373 ] ASF GitHub Bot commented on KAFKA-5517: --- tombentley commented on pull request #3436: KAFKA-5517: Add id to config HTML tables to allow linking URL: https://github.com/apache/kafka/pull/3436 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support linking to particular configuration parameters > -- > > Key: KAFKA-5517 > URL: https://issues.apache.org/jira/browse/KAFKA-5517 > Project: Kafka > Issue Type: Improvement > Components: documentation >Reporter: Tom Bentley >Assignee: Tom Bentley >Priority: Minor > Labels: patch-available > > Currently the configuration parameters are documented long tables, and it's > only possible to link to the heading before a particular table. When > discussing configuration parameters on forums it would be helpful to be able > to link to the particular parameter under discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-5554) Hilight config settings for particular common use cases
[ https://issues.apache.org/jira/browse/KAFKA-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044371#comment-17044371 ] ASF GitHub Bot commented on KAFKA-5554: --- tombentley commented on pull request #3486: KAFKA-5554 doc config common URL: https://github.com/apache/kafka/pull/3486 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Hilight config settings for particular common use cases > --- > > Key: KAFKA-5554 > URL: https://issues.apache.org/jira/browse/KAFKA-5554 > Project: Kafka > Issue Type: Improvement > Components: documentation >Reporter: Tom Bentley >Assignee: Tom Bentley >Priority: Minor > > Judging by the sorts of questions seen on the mailling list, stack overflow > etc it seems common for users to assume that Kafka will default to settings > which won't lose messages. They start using Kafka and at some later time find > messages have been lost. > While it's not our fault if users don't read the documentation, there's a lot > of configuration documentation to digest and it's easy for people to miss an > important setting. > Therefore, I'd like to suggest that in addition to the current configuration > docs we add a short section highlighting those settings which pertain to > common use cases, such as: > * configs to avoid lost messages > * configs for low latency > I'm sure some users will continue to not read the documentation, but when > they inevitably start asking questions it means people can respond with "have > you configured everything as described here?" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-6359) Work for KIP-236
[ https://issues.apache.org/jira/browse/KAFKA-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044367#comment-17044367 ] ASF GitHub Bot commented on KAFKA-6359: --- tombentley commented on pull request #4330: [WIP] KAFKA-6359: KIP-236 interruptible reassignments URL: https://github.com/apache/kafka/pull/4330 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Work for KIP-236 > > > Key: KAFKA-6359 > URL: https://issues.apache.org/jira/browse/KAFKA-6359 > Project: Kafka > Issue Type: Improvement >Reporter: Tom Bentley >Assignee: GEORGE LI >Priority: Minor > > This issue is for the work described in KIP-236. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9604) Падение кластера
Maksim Larionov created KAFKA-9604: -- Summary: Падение кластера Key: KAFKA-9604 URL: https://issues.apache.org/jira/browse/KAFKA-9604 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.2.1 Reporter: Maksim Larionov Добрый день! На одном из серверов в кластере произошло переполнение дискового пространства. При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в log.dirs. При достижении retention time сработала очистка и физический файл 07607076.log не был найден. Брокер аварийно остановился. [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base offsets [7607076] due to retention time 60480ms breach (kafka.log.Log) [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 7607076, size 131228281] for deletion. (kafka.log.Log) [2020-02-06 13:32:48,979] ERROR Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.LogDirFailureChannel) java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted [2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving replicas in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.ReplicaManager) [2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler) org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data Caused by: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted ... [2020-02-06 13:32:49,058] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Затем аварийно остановились все остальные ноды кластера на выборах лидеров партиций: [2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making broker the leader for partition Topic: ocs.counter-balances; Partition: 40; Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager) org.apache.kafka.common.errors.KafkaStorageException: Error while writing to checkpoint file /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint Caused by: java.io.FileNotFoundException: /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp (No such file or directory) не смогли переписать leader-epoch-checkpoint и остановились по этой причине [2020-02-06 13:32:53,687] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Является ли эта ситуация нормой? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9572) Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some Records
[ https://issues.apache.org/jira/browse/KAFKA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044308#comment-17044308 ] Bruno Cadonna commented on KAFKA-9572: -- Sounds good to me. It is a pity that we cannot backport the potential fix, but I see that it would be really hard. > Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some > Records > --- > > Key: KAFKA-9572 > URL: https://issues.apache.org/jira/browse/KAFKA-9572 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Bruno Cadonna >Assignee: Guozhang Wang >Priority: Blocker > Fix For: 2.5.0 > > Attachments: 7-changelog-1.txt, data-1.txt, streams22.log, > streams23.log, streams30.log, sum-1.txt > > > System test {{StreamsEosTest.test_failure_and_recovery}} failed due to a > wrongly computed aggregation under exactly-once (EOS). The specific error is: > {code:java} > Exception in thread "main" java.lang.RuntimeException: Result verification > failed for ConsumerRecord(topic = sum, partition = 1, leaderEpoch = 0, offset > = 2805, CreateTime = 1580719595164, serialized key size = 4, serialized value > size = 8, headers = RecordHeaders(headers = [], isReadOnly = false), key = > [B@6c779568, value = [B@f381794) expected <6069,17269> but was <6069,10698> > at > org.apache.kafka.streams.tests.EosTestDriver.verifySum(EosTestDriver.java:444) > at > org.apache.kafka.streams.tests.EosTestDriver.verify(EosTestDriver.java:196) > at > org.apache.kafka.streams.tests.StreamsEosTest.main(StreamsEosTest.java:69) > {code} > That means, the sum computed by the Streams app seems to be wrong for key > 6069. I checked the dumps of the log segments of the input topic partition > (attached: data-1.txt) and indeed two input records are not considered in the > sum. With those two missed records the sum would be correct. More concretely, > the input values for key 6069 are: > # 147 > # 9250 > # 5340 > # 1231 > # 1301 > The sum of this values is 17269 as stated in the exception above as expected > sum. If you subtract values 3 and 4, i.e., 5340 and 1231 from 17269, you get > 10698 , which is the actual sum in the exception above. Somehow those two > values are missing. > In the log dump of the output topic partition (attached: sum-1.txt), the sum > is correct until the 4th value 1231 , i.e. 15968, then it is overwritten with > 10698. > In the log dump of the changelog topic of the state store that stores the sum > (attached: 7-changelog-1.txt), the sum is also overwritten as in the output > topic. > I attached the logs of the three Streams instances involved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9603) Number of open files keeps increasing in Streams application
Bruno Iljazovic created KAFKA-9603: -- Summary: Number of open files keeps increasing in Streams application Key: KAFKA-9603 URL: https://issues.apache.org/jira/browse/KAFKA-9603 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.3.1, 2.4.0 Environment: Spring Boot 2.2.4, OpenJDK 13, Centos image Reporter: Bruno Iljazovic Problem appeared when upgrading from *2.0.1* to *2.3.1*. Relevant Kafka Streams code: {code:java} KStream events1 = builder.stream(FIRST_TOPIC_NAME, Consumed.with(stringSerde, event1Serde, event1TimestampExtractor(), null)) .mapValues(...); KStream events2 = builder.stream(SECOND_TOPIC_NAME, Consumed.with(stringSerde, event2Serde, event2TimestampExtractor(), null)) .mapValues(...); var joinWindows = JoinWindows.of(Duration.of(1, MINUTES).toMillis()) .until(Duration.of(1, HOURS).toMillis()); events2.join(events1, this::join, joinWindows, Joined.with(stringSerde, event2Serde, event1Serde)) .foreach(...); {code} Number of open *.sst files keeps increasing until eventually it hits the os limit (65536) and causes this exception: {code:java} Caused by: org.rocksdb.RocksDBException: While open a file for appending: /.../0_8/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.157943520/001354.sst: Too many open files at org.rocksdb.RocksDB.flush(Native Method) at org.rocksdb.RocksDB.flush(RocksDB.java:2394) {code} Here are example files that are opened and never closed: {code:java} /.../0_27/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000114.sst /.../0_27/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.158245920/65.sst /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158215680/000115.sst /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000112.sst /.../0_31/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158185440/51.sst {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)