[jira] [Commented] (KAFKA-9594) speed up the processing of LeaderAndIsrRequest

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045151#comment-17045151
 ] 

ASF GitHub Bot commented on KAFKA-9594:
---

omkreddy commented on pull request #8153: KAFKA-9594: Add a separate lock to 
pause the follower log append while checking if the log dir could be replaced.
URL: https://github.com/apache/kafka/pull/8153
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> speed up the processing of LeaderAndIsrRequest
> --
>
> Key: KAFKA-9594
> URL: https://issues.apache.org/jira/browse/KAFKA-9594
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Assignee: Manikumar
>Priority: Minor
> Fix For: 2.6.0
>
>
> Observations from [~junrao]
> Currently, Partition.makerFollower() holds a write lock on 
> leaderIsrUpdateLock. Partition.doAppendRecordsToFollowerOrFutureReplica() 
> holds a read lock on leaderIsrUpdateLock. So, if there is an ongoing log 
> append on the follower, the makeFollower() call will be delayed. This path is 
> a bit different when serving the Partition.makeLeader() call. Before we make 
> a call on Partition.makerLeader(), we first remove the follower from the 
> replicaFetcherThread. So, the makerLeader() call won't be delayed because of 
> log append. This means that when we change one follower to become leader and 
> another follower to follow the new leader during a controlled shutdown, the 
> makerLeader() call typically completes faster than the makeFollower() call, 
> which can delay the follower fetching from the new leader and cause ISR to 
> shrink.
> This only reason that Partition.doAppendRecordsToFollowerOrFutureReplica() 
> needs to hold a read lock on leaderIsrUpdateLock is for 
> Partiiton.maybeReplaceCurrentWithFutureReplica() to pause the log append 
> while checking if the log dir could be replaced. We could potentially add a 
> separate lock (sth like futureLogLock) that's synced between 
> maybeReplaceCurrentWithFutureReplica() and 
> doAppendRecordsToFollowerOrFutureReplica(). Then, 
> doAppendRecordsToFollowerOrFutureReplica() doesn't need to hold the lock on 
> leaderIsrUpdateLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9594) speed up the processing of LeaderAndIsrRequest

2020-02-25 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-9594.
--
Resolution: Fixed

Issue resolved by pull request 8153
[https://github.com/apache/kafka/pull/8153]

> speed up the processing of LeaderAndIsrRequest
> --
>
> Key: KAFKA-9594
> URL: https://issues.apache.org/jira/browse/KAFKA-9594
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Assignee: Manikumar
>Priority: Minor
> Fix For: 2.6.0
>
>
> Observations from [~junrao]
> Currently, Partition.makerFollower() holds a write lock on 
> leaderIsrUpdateLock. Partition.doAppendRecordsToFollowerOrFutureReplica() 
> holds a read lock on leaderIsrUpdateLock. So, if there is an ongoing log 
> append on the follower, the makeFollower() call will be delayed. This path is 
> a bit different when serving the Partition.makeLeader() call. Before we make 
> a call on Partition.makerLeader(), we first remove the follower from the 
> replicaFetcherThread. So, the makerLeader() call won't be delayed because of 
> log append. This means that when we change one follower to become leader and 
> another follower to follow the new leader during a controlled shutdown, the 
> makerLeader() call typically completes faster than the makeFollower() call, 
> which can delay the follower fetching from the new leader and cause ISR to 
> shrink.
> This only reason that Partition.doAppendRecordsToFollowerOrFutureReplica() 
> needs to hold a read lock on leaderIsrUpdateLock is for 
> Partiiton.maybeReplaceCurrentWithFutureReplica() to pause the log append 
> while checking if the log dir could be replaced. We could potentially add a 
> separate lock (sth like futureLogLock) that's synced between 
> maybeReplaceCurrentWithFutureReplica() and 
> doAppendRecordsToFollowerOrFutureReplica(). Then, 
> doAppendRecordsToFollowerOrFutureReplica() doesn't need to hold the lock on 
> leaderIsrUpdateLock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8713) [Connect] JsonConverter NULL Values are replaced by default values even in NULLABLE fields

2020-02-25 Thread Cheng Pan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045149#comment-17045149
 ] 

Cheng Pan commented on KAFKA-8713:
--

[~rhauch], thanks for your reply. I totally agree with you this is an 
incompatibility change, and I also think that introducing a configuration 
property is a good choice. I'd like to follow the process to come out a KIP 
draft to push on this ISSUE.

> [Connect] JsonConverter NULL Values are replaced by default values even in 
> NULLABLE fields
> --
>
> Key: KAFKA-8713
> URL: https://issues.apache.org/jira/browse/KAFKA-8713
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.3.0, 2.2.1
>Reporter: Cheng Pan
>Priority: Major
>  Labels: needs-kip
>
> Class JsonConverter line: 582
> {code:java}
> private static JsonNode convertToJson(Schema schema, Object logicalValue) 
> {
> if (logicalValue == null) {
> if (schema == null) // Any schema is valid and we don't have a 
> default, so treat this as an optional schema
> return null;
> if (schema.defaultValue() != null)
> return convertToJson(schema, schema.defaultValue());
> if (schema.isOptional())
> return JsonNodeFactory.instance.nullNode();
> throw new DataException("Conversion error: null value for field 
> that is required and has no default value");
> }
> 
> }
> {code}
> h1.Expect:
> Value `null` is valid for an optional filed, even though the filed has a 
> default value.
>  Only when field is required, the converter return default value fallback 
> when value is `null`.
> h1.Actual:
> Always return default value if `null` was given.
> h1. Example:
> I'm not sure if the current behavior is the exactly expected, but at least on 
> MySQL, a table  define as 
> {code:sql}
> create table t1 {
>name varchar(40) not null,
>create_time datetime default '1999-01-01 11:11:11' null,
>update_time datetime default '1999-01-01 11:11:11' null
> }
> {code}
> Just insert a record:
> {code:sql}
> INSERT INTO `t1` (`name`,  `update_time`) VALUES ('kafka', null);
> {code}
> The result is:
> {code:json}
> {
> "name": "kafka",
> "create_time": "1999-01-01 11:11:11",
> "update_time": null
> }
> {code}
> But when I use debezium pull binlog and send the record to Kafka with 
> JsonConverter, the result changed to:
> {code:json}
> {
> "name": "kafka",
> "create_time": "1999-01-01 11:11:11",
> "update_time": "1999-01-01 11:11:11"
> }
> {code}
> For more details, see: https://issues.jboss.org/browse/DBZ-1064



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9602) Incorrect close of producer instance during partition assignment

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045131#comment-17045131
 ] 

ASF GitHub Bot commented on KAFKA-9602:
---

guozhangwang commented on pull request #8166: KAFKA-9602: Close the stream 
internal producer only in EOS
URL: https://github.com/apache/kafka/pull/8166
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incorrect close of producer instance during partition assignment
> 
>
> Key: KAFKA-9602
> URL: https://issues.apache.org/jira/browse/KAFKA-9602
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> The new StreamProducer instance close doesn't distinguish between an 
> EOS/non-EOS shutdown. The StreamProducer should take care of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9602) Incorrect close of producer instance during partition assignment

2020-02-25 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-9602.
--
Resolution: Fixed

> Incorrect close of producer instance during partition assignment
> 
>
> Key: KAFKA-9602
> URL: https://issues.apache.org/jira/browse/KAFKA-9602
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> The new StreamProducer instance close doesn't distinguish between an 
> EOS/non-EOS shutdown. The StreamProducer should take care of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9498) Topic validation during the creation trigger unnecessary TopicChange events

2020-02-25 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-9498.

Fix Version/s: 2.6.0
   Resolution: Fixed

Merged the PR to trunk.

> Topic validation during the creation trigger unnecessary TopicChange events 
> 
>
> Key: KAFKA-9498
> URL: https://issues.apache.org/jira/browse/KAFKA-9498
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 2.6.0
>
>
> I have found out that the topic validation logic, which is executed when 
> CreateTopicPolicy or when validateOnly is set, triggers unnecessary 
> ChangeTopic events in the controller. In the worst case, it can trigger up to 
> one event per created topic and leads to overloading the controller.
> This happens because the validation logic reads all the topics from ZK using 
> the method getAllTopicsInCluster provided by the KafkaZKClient. This method 
> registers a watch every time the topics are read from Zookeeper.
> I think that we should make the watch registration optional for this call in 
> oder to avoid this unwanted behaviour.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9498) Topic validation during the creation trigger unnecessary TopicChange events

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045071#comment-17045071
 ] 

ASF GitHub Bot commented on KAFKA-9498:
---

junrao commented on pull request #8062: KAFKA-9498; Topic validation during the 
topic creation triggers unnecessary TopicChange events
URL: https://github.com/apache/kafka/pull/8062
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Topic validation during the creation trigger unnecessary TopicChange events 
> 
>
> Key: KAFKA-9498
> URL: https://issues.apache.org/jira/browse/KAFKA-9498
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
>
> I have found out that the topic validation logic, which is executed when 
> CreateTopicPolicy or when validateOnly is set, triggers unnecessary 
> ChangeTopic events in the controller. In the worst case, it can trigger up to 
> one event per created topic and leads to overloading the controller.
> This happens because the validation logic reads all the topics from ZK using 
> the method getAllTopicsInCluster provided by the KafkaZKClient. This method 
> registers a watch every time the topics are read from Zookeeper.
> I think that we should make the watch registration optional for this call in 
> oder to avoid this unwanted behaviour.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9572) Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some Records

2020-02-25 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-9572.
--
Fix Version/s: (was: 2.5.0)
   2.6.0
   Resolution: Fixed

> Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some 
> Records
> ---
>
> Key: KAFKA-9572
> URL: https://issues.apache.org/jira/browse/KAFKA-9572
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.0
>Reporter: Bruno Cadonna
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: 7-changelog-1.txt, data-1.txt, streams22.log, 
> streams23.log, streams30.log, sum-1.txt
>
>
> System test {{StreamsEosTest.test_failure_and_recovery}} failed due to a 
> wrongly computed aggregation under exactly-once (EOS). The specific error is:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Result verification 
> failed for ConsumerRecord(topic = sum, partition = 1, leaderEpoch = 0, offset 
> = 2805, CreateTime = 1580719595164, serialized key size = 4, serialized value 
> size = 8, headers = RecordHeaders(headers = [], isReadOnly = false), key = 
> [B@6c779568, value = [B@f381794) expected <6069,17269> but was <6069,10698>
>   at 
> org.apache.kafka.streams.tests.EosTestDriver.verifySum(EosTestDriver.java:444)
>   at 
> org.apache.kafka.streams.tests.EosTestDriver.verify(EosTestDriver.java:196)
>   at 
> org.apache.kafka.streams.tests.StreamsEosTest.main(StreamsEosTest.java:69)
> {code} 
> That means, the sum computed by the Streams app seems to be wrong for key 
> 6069. I checked the dumps of the log segments of the input topic partition 
> (attached: data-1.txt) and indeed two input records are not considered in the 
> sum. With those two missed records the sum would be correct. More concretely, 
> the input values for key 6069 are:
> # 147
> # 9250
> # 5340 
> # 1231
> # 1301
> The sum of this values is 17269 as stated in the exception above as expected 
> sum. If you subtract values 3 and 4, i.e., 5340 and 1231 from 17269, you get 
> 10698 , which is the actual sum in the exception above. Somehow those two 
> values are missing.
> In the log dump of the output topic partition (attached: sum-1.txt), the sum 
> is correct until the 4th value 1231 , i.e. 15968, then it is overwritten with 
> 10698.
> In the log dump of the changelog topic of the state store that stores the sum 
> (attached: 7-changelog-1.txt), the sum is also overwritten as in the output 
> topic.
> I attached the logs of the three Streams instances involved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9610) Should not throw illegal state exception during task revocation

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045002#comment-17045002
 ] 

ASF GitHub Bot commented on KAFKA-9610:
---

abbccdda commented on pull request #8169: KAFKA-9610: do not throw illegal 
state when remaining partitions are not empty
URL: https://github.com/apache/kafka/pull/8169
 
 
   For `handleRevocation`, it is possible that previous onAssignment callback 
has cleaned up the stream tasks, which means no corresponding task could be 
found for given partitions. We should not throw here as this is expected 
behavior.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Should not throw illegal state exception during task revocation
> ---
>
> Key: KAFKA-9610
> URL: https://issues.apache.org/jira/browse/KAFKA-9610
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> In handleRevocation call, the remaining partitions could cause an illegal 
> state exception on task revocation. This should also be fixed as it is 
> expected when the tasks are cleared from the assignor onAssignment callback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9607) Should not clear partition queue during task close

2020-02-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9607:
---
Description: 
We detected an issue with a corrupted task failed to revive:
{code:java}
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,137] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle 
new assignment with:
        New active tasks: [0_0, 3_1]
        New standby tasks: []
        Existing active tasks: [0_0]
        Existing standby tasks: [] 
(org.apache.kafka.streams.processor.internals.TaskManager)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,138] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
 groupId=stream-soak-test] Adding newly assigned partitions: 
k8sName-id-repartition-1 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,138] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State 
transition from RUNNING to PARTITIONS_ASSIGNED 
(org.apache.kafka.streams.processor.internals.StreamThread)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,419] WARN 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException 
fetching records from restore consumer for partitions 
[stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it is 
likely that the consumer's position has fallen out of the topic partition 
offset range because the topic was truncated or compacted on the broker, 
marking the corresponding tasks as corrupted and re-initializingit later. 
(org.apache.kafka.streams.processor.internals.StoreChangelogReader)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,139] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
 groupId=stream-soak-test] Setting offset for partition 
k8sName-id-repartition-1 to the committed offset FetchPosition{offset=3592242, 
offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092
 (id: 1003 rack: null)], epoch=absent}} 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,463] ERROR 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
Encountered the following exception during processing and the thread is going 
to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) 
java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found.
        at 
org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209)
        at 
org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725)
{code}
The root cause is that we accidentally cleanup the partition group map so that 
next time we reboot the task it would miss input partitions.

By avoiding clean up the partition group, we may have a slight overhead for GC 
which is ok. In terms of correctness, currently there is no way to revive the 
task with partitions reassigned.

  was:
We detected an issue with a corrupted task failed to revive:
{code:java}
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57

[jira] [Updated] (KAFKA-9607) Should not clear partition queue during task close

2020-02-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9607:
---
Summary: Should not clear partition queue during task close  (was: Should 
not clear partition queue or throw illegal state exception during task 
revocation)

> Should not clear partition queue during task close
> --
>
> Key: KAFKA-9607
> URL: https://issues.apache.org/jira/browse/KAFKA-9607
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> We detected an issue with a corrupted task failed to revive:
> {code:java}
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,137] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle 
> new assignment with:
>         New active tasks: [0_0, 3_1]
>         New standby tasks: []
>         Existing active tasks: [0_0]
>         Existing standby tasks: [] 
> (org.apache.kafka.streams.processor.internals.TaskManager)
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,138] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
>  groupId=stream-soak-test] Adding newly assigned partitions: 
> k8sName-id-repartition-1 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,138] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State 
> transition from RUNNING to PARTITIONS_ASSIGNED 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,419] WARN 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException 
> fetching records from restore consumer for partitions 
> [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it 
> is likely that the consumer's position has fallen out of the topic partition 
> offset range because the topic was truncated or compacted on the broker, 
> marking the corresponding tasks as corrupted and re-initializingit later. 
> (org.apache.kafka.streams.processor.internals.StoreChangelogReader)
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,139] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
>  groupId=stream-soak-test] Setting offset for partition 
> k8sName-id-repartition-1 to the committed offset 
> FetchPosition{offset=3592242, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092
>  (id: 1003 rack: null)], epoch=absent}} 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:39-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,463] ERROR 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> Encountered the following exception during processing and the thread is going 
> to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) 
> java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found.
>         at 
> org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209)
>         at 
> org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(St

[jira] [Updated] (KAFKA-9607) Should not clear partition queue or throw illegal state exception during task revocation

2020-02-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9607:
---
Description: 
We detected an issue with a corrupted task failed to revive:
{code:java}
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,137] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle 
new assignment with:
        New active tasks: [0_0, 3_1]
        New standby tasks: []
        Existing active tasks: [0_0]
        Existing standby tasks: [] 
(org.apache.kafka.streams.processor.internals.TaskManager)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,138] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
 groupId=stream-soak-test] Adding newly assigned partitions: 
k8sName-id-repartition-1 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,138] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State 
transition from RUNNING to PARTITIONS_ASSIGNED 
(org.apache.kafka.streams.processor.internals.StreamThread)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,419] WARN 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException 
fetching records from restore consumer for partitions 
[stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it is 
likely that the consumer's position has fallen out of the topic partition 
offset range because the topic was truncated or compacted on the broker, 
marking the corresponding tasks as corrupted and re-initializingit later. 
(org.apache.kafka.streams.processor.internals.StoreChangelogReader)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,139] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
 groupId=stream-soak-test] Setting offset for partition 
k8sName-id-repartition-1 to the committed offset FetchPosition{offset=3592242, 
offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092
 (id: 1003 rack: null)], epoch=absent}} 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,463] ERROR 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
Encountered the following exception during processing and the thread is going 
to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) 
java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found.
        at 
org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209)
        at 
org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725)
{code}
The root cause is that we accidentally cleanup the partition group map so that 
next time we reboot the task it would miss input partitions.

  was:
1. We detected an issue with a corrupted task failed to revive:
{code:java}
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,137] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32

[jira] [Created] (KAFKA-9610) Should not throw illegal state exception during task revocation

2020-02-25 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9610:
--

 Summary: Should not throw illegal state exception during task 
revocation
 Key: KAFKA-9610
 URL: https://issues.apache.org/jira/browse/KAFKA-9610
 Project: Kafka
  Issue Type: Bug
Reporter: Boyang Chen
Assignee: Boyang Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9610) Should not throw illegal state exception during task revocation

2020-02-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9610:
---
Description: In handleRevocation call, the remaining partitions could cause 
an illegal state exception on task revocation. This should also be fixed as it 
is expected when the tasks are cleared from the assignor onAssignment callback.

> Should not throw illegal state exception during task revocation
> ---
>
> Key: KAFKA-9610
> URL: https://issues.apache.org/jira/browse/KAFKA-9610
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> In handleRevocation call, the remaining partitions could cause an illegal 
> state exception on task revocation. This should also be fixed as it is 
> expected when the tasks are cleared from the assignor onAssignment callback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-7906) Improve failed leader election logging

2020-02-25 Thread Agam Brahma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Agam Brahma reassigned KAFKA-7906:
--

Assignee: Agam Brahma

> Improve failed leader election logging
> --
>
> Key: KAFKA-7906
> URL: https://issues.apache.org/jira/browse/KAFKA-7906
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Agam Brahma
>Priority: Major
>
> We often see annoying log messages like the following in the state change log:
> {code}
> [2019-02-05 00:02:51,307] ERROR [Controller id=13 epoch=14] Controller 13 
> epoch 14 failed to change state for partition topic-3 from OnlinePartition to 
> OnlinePartition
>  (state.change.logger)
> kafka.common.StateChangeFailedException: Failed to elect leader for partition 
> topic-3 under strategy PreferredReplicaPartitionLeaderElectionStrategy
> at 
> kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:328)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:326)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:326)
> at 
> kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:254)
> at 
> kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:175)
> at 
> kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116)
> {code}
> The stack trace is not adding any value and the message doesn't explain why 
> the election failed. You have to read the code to figure it out. It's also 
> curious that you have to look in the state change log for failed leader 
> elections in the first place. It would be more intuitive to put these in the 
> controller log. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9607) Should not clear partition queue or throw illegal state exception during task revocation

2020-02-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9607:
---
Description: 
1. We detected an issue with a corrupted task failed to revive:
{code:java}
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,137] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle 
new assignment with:
        New active tasks: [0_0, 3_1]
        New standby tasks: []
        Existing active tasks: [0_0]
        Existing standby tasks: [] 
(org.apache.kafka.streams.processor.internals.TaskManager)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,138] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
 groupId=stream-soak-test] Adding newly assigned partitions: 
k8sName-id-repartition-1 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,138] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State 
transition from RUNNING to PARTITIONS_ASSIGNED 
(org.apache.kafka.streams.processor.internals.StreamThread)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,419] WARN 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException 
fetching records from restore consumer for partitions 
[stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it is 
likely that the consumer's position has fallen out of the topic partition 
offset range because the topic was truncated or compacted on the broker, 
marking the corresponding tasks as corrupted and re-initializingit later. 
(org.apache.kafka.streams.processor.internals.StoreChangelogReader)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,139] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
 groupId=stream-soak-test] Setting offset for partition 
k8sName-id-repartition-1 to the committed offset FetchPosition{offset=3592242, 
offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092
 (id: 1003 rack: null)], epoch=absent}} 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,463] ERROR 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
Encountered the following exception during processing and the thread is going 
to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) 
java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found.
        at 
org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209)
        at 
org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725)
{code}
The root cause is that we accidentally cleanup the partition group map so that 
next time we reboot the task it would miss input partitions.

2. Another issue tracked in this ticket is that the remaining partitions could 
cause an illegal state exception on task revocation. This should also be fixed 
as it is expected when the tasks are cleared from the assignor onAssignment 
callback.

  was:
We detected an issue with a corrupted task failed to revive:
{code:java}
[2020-0

[jira] [Created] (KAFKA-9609) Memory Leak in Kafka Producer

2020-02-25 Thread Satish (Jira)
Satish created KAFKA-9609:
-

 Summary: Memory Leak in Kafka Producer
 Key: KAFKA-9609
 URL: https://issues.apache.org/jira/browse/KAFKA-9609
 Project: Kafka
  Issue Type: Bug
  Components: clients, producer 
Affects Versions: 2.4.0
Reporter: Satish


org.apache.kafka.clients.producer.internals.Sender adds Topic Metrics for every 
topic that we are writing messages to but it never been cleaned up until we 
close the producer.

This is an issue if we use single producer and have more number of Dynamic 
topics (eg: ~ 500 topics per hour) and writing messages to them.  As this 
Metrics map is getting accumulated for every topic, over a period of time we 
notice the memory usage gets increased gradually. 

It can be easily reproducible by writing messages to the more # of dynamic 
topics using the same KafkaProducer from apache kafka client libraries or 
KafkaTemplate from Spring.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9607) Should not clear partition queue or throw illegal state exception during task revocation

2020-02-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9607:
---
Summary: Should not clear partition queue or throw illegal state exception 
during task revocation  (was: Should not clear partition group if the task will 
be revived again)

> Should not clear partition queue or throw illegal state exception during task 
> revocation
> 
>
> Key: KAFKA-9607
> URL: https://issues.apache.org/jira/browse/KAFKA-9607
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> We detected an issue with a corrupted task failed to revive:
> {code:java}
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,137] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle 
> new assignment with:
>         New active tasks: [0_0, 3_1]
>         New standby tasks: []
>         Existing active tasks: [0_0]
>         Existing standby tasks: [] 
> (org.apache.kafka.streams.processor.internals.TaskManager)
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,138] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
>  groupId=stream-soak-test] Adding newly assigned partitions: 
> k8sName-id-repartition-1 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,138] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State 
> transition from RUNNING to PARTITIONS_ASSIGNED 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,419] WARN 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException 
> fetching records from restore consumer for partitions 
> [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it 
> is likely that the consumer's position has fallen out of the topic partition 
> offset range because the topic was truncated or compacted on the broker, 
> marking the corresponding tasks as corrupted and re-initializingit later. 
> (org.apache.kafka.streams.processor.internals.StoreChangelogReader)
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,139] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
>  groupId=stream-soak-test] Setting offset for partition 
> k8sName-id-repartition-1 to the committed offset 
> FetchPosition{offset=3592242, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092
>  (id: 1003 rack: null)], epoch=absent}} 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:39-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,463] ERROR 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> Encountered the following exception during processing and the thread is going 
> to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) 
> java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found.
>         at 
> org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209)
>         at 
> org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager

[jira] [Assigned] (KAFKA-9347) Detect deleted log directory before becoming leader

2020-02-25 Thread Agam Brahma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Agam Brahma reassigned KAFKA-9347:
--

Assignee: (was: Agam Brahma)

> Detect deleted log directory before becoming leader
> ---
>
> Key: KAFKA-9347
> URL: https://issues.apache.org/jira/browse/KAFKA-9347
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Priority: Major
>  Labels: needs-discussion
>
> There is no protection currently if a broker has had its log directory 
> deleted to prevent it from becoming the leader of a partition that it still 
> remains in the ISR of. This scenario can happen when the last remaining 
> replica in the ISR is shutdown. It will remain in the ISR and be eligible for 
> leadership as soon as it starts up. It would be useful to either detect this 
> case situation dynamically in order to force the user to do an unclean 
> election or recover another broker. One option might be just to pass a flag 
> on startup to specify that a broker should not be eligible for leadership. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8713) [Connect] JsonConverter NULL Values are replaced by default values even in NULLABLE fields

2020-02-25 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-8713:
-
Labels: needs-kip  (was: )

> [Connect] JsonConverter NULL Values are replaced by default values even in 
> NULLABLE fields
> --
>
> Key: KAFKA-8713
> URL: https://issues.apache.org/jira/browse/KAFKA-8713
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.3.0, 2.2.1
>Reporter: Cheng Pan
>Priority: Major
>  Labels: needs-kip
>
> Class JsonConverter line: 582
> {code:java}
> private static JsonNode convertToJson(Schema schema, Object logicalValue) 
> {
> if (logicalValue == null) {
> if (schema == null) // Any schema is valid and we don't have a 
> default, so treat this as an optional schema
> return null;
> if (schema.defaultValue() != null)
> return convertToJson(schema, schema.defaultValue());
> if (schema.isOptional())
> return JsonNodeFactory.instance.nullNode();
> throw new DataException("Conversion error: null value for field 
> that is required and has no default value");
> }
> 
> }
> {code}
> h1.Expect:
> Value `null` is valid for an optional filed, even though the filed has a 
> default value.
>  Only when field is required, the converter return default value fallback 
> when value is `null`.
> h1.Actual:
> Always return default value if `null` was given.
> h1. Example:
> I'm not sure if the current behavior is the exactly expected, but at least on 
> MySQL, a table  define as 
> {code:sql}
> create table t1 {
>name varchar(40) not null,
>create_time datetime default '1999-01-01 11:11:11' null,
>update_time datetime default '1999-01-01 11:11:11' null
> }
> {code}
> Just insert a record:
> {code:sql}
> INSERT INTO `t1` (`name`,  `update_time`) VALUES ('kafka', null);
> {code}
> The result is:
> {code:json}
> {
> "name": "kafka",
> "create_time": "1999-01-01 11:11:11",
> "update_time": null
> }
> {code}
> But when I use debezium pull binlog and send the record to Kafka with 
> JsonConverter, the result changed to:
> {code:json}
> {
> "name": "kafka",
> "create_time": "1999-01-01 11:11:11",
> "update_time": "1999-01-01 11:11:11"
> }
> {code}
> For more details, see: https://issues.jboss.org/browse/DBZ-1064



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8713) [Connect] JsonConverter NULL Values are replaced by default values even in NULLABLE fields

2020-02-25 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044972#comment-17044972
 ] 

Randall Hauch commented on KAFKA-8713:
--

[~pan3793], I agree that the current behavior is incorrect. But correcting this 
would change behavior, and we'll have to worry about how this affects existing 
users that upgrade to a version where this might get fixed. And AK's policy is 
that such changes require a KIP to identify all potential compatibility 
concerns. For example, we should consider whether we need to introduce a 
configuration property on the JSON converter that enables/disables this 
behavior so that existing users don't see any changes unless they want to 
enable it.

> [Connect] JsonConverter NULL Values are replaced by default values even in 
> NULLABLE fields
> --
>
> Key: KAFKA-8713
> URL: https://issues.apache.org/jira/browse/KAFKA-8713
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.3.0, 2.2.1
>Reporter: Cheng Pan
>Priority: Major
>
> Class JsonConverter line: 582
> {code:java}
> private static JsonNode convertToJson(Schema schema, Object logicalValue) 
> {
> if (logicalValue == null) {
> if (schema == null) // Any schema is valid and we don't have a 
> default, so treat this as an optional schema
> return null;
> if (schema.defaultValue() != null)
> return convertToJson(schema, schema.defaultValue());
> if (schema.isOptional())
> return JsonNodeFactory.instance.nullNode();
> throw new DataException("Conversion error: null value for field 
> that is required and has no default value");
> }
> 
> }
> {code}
> h1.Expect:
> Value `null` is valid for an optional filed, even though the filed has a 
> default value.
>  Only when field is required, the converter return default value fallback 
> when value is `null`.
> h1.Actual:
> Always return default value if `null` was given.
> h1. Example:
> I'm not sure if the current behavior is the exactly expected, but at least on 
> MySQL, a table  define as 
> {code:sql}
> create table t1 {
>name varchar(40) not null,
>create_time datetime default '1999-01-01 11:11:11' null,
>update_time datetime default '1999-01-01 11:11:11' null
> }
> {code}
> Just insert a record:
> {code:sql}
> INSERT INTO `t1` (`name`,  `update_time`) VALUES ('kafka', null);
> {code}
> The result is:
> {code:json}
> {
> "name": "kafka",
> "create_time": "1999-01-01 11:11:11",
> "update_time": null
> }
> {code}
> But when I use debezium pull binlog and send the record to Kafka with 
> JsonConverter, the result changed to:
> {code:json}
> {
> "name": "kafka",
> "create_time": "1999-01-01 11:11:11",
> "update_time": "1999-01-01 11:11:11"
> }
> {code}
> For more details, see: https://issues.jboss.org/browse/DBZ-1064



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8458) Flaky Test AdminClientIntegrationTest#testElectPreferredLeaders

2020-02-25 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044956#comment-17044956
 ] 

Boyang Chen commented on KAFKA-8458:


h3. Error Message

org.junit.ComparisonFailure: expected:<... for topic partition[]> but was:<... 
for topic partition[.]>
h3. Stacktrace

org.junit.ComparisonFailure: expected:<... for topic partition[]> but was:<... 
for topic partition[.]> at org.junit.Assert.assertEquals(Assert.java:117) at 
org.junit.Assert.assertEquals(Assert.java:146) at 
kafka.api.PlaintextAdminIntegrationTest.testElectPreferredLeaders(PlaintextAdminIntegrationTest.scala:1288)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at 
java.base/java.lang.Thread.run(Thread.java:834)

> Flaky Test AdminClientIntegrationTest#testElectPreferredLeaders
> ---
>
> Key: KAFKA-8458
> URL: https://issues.apache.org/jira/browse/KAFKA-8458
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: flaky-test
>
> Failed locally:
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout.
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>   at 
> kafka.api.AdminClientIntegrationTest.testElectPreferredLeaders(AdminClientIntegrationTest.scala:1282)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Aborted due to 
> timeout.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-9596) Hanging test case `testMaxLogCompactionLag`

2020-02-25 Thread Agam Brahma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044950#comment-17044950
 ] 

Agam Brahma edited comment on KAFKA-9596 at 2/25/20 10:16 PM:
--

Wasn't able to repro this failure. Passed on trunk (`5216da3d`) with 100 runs:

 

{{$ for i in $(seq 1 100); do ./gradlew :core:cleanTest :core:test --tests 
"kafka.log.LogCleanerIntegrationTest.testMaxLogCompactionLag" | rg 
'testMaxLogCompactionLag'; done > /tmp/testsummary.txt}}

 

{{$ diff <(cat /tmp/testsummary.txt | wc -l) <(rg 'PASS' /tmp/testsummary.txt | 
wc -l) >/dev/null; echo $?}}
 {{0}}

 

{{(will leave it open until this test hangs/fails again)}}

 


was (Author: agam):
Wasn't able to repro this failure. Passed on trunk (`5216da3d`) with 100 runs:

 

{{$ for i in $(seq 1 100); do ./gradlew :core:cleanTest :core:test --tests 
}}{{"kafka.log.LogCleanerIntegrationTest.testMaxLogCompactionLag" | rg 
'testMaxLogCompactionLag'; done > /tmp/testsummary.txt}}

 

{{$ diff <(cat /tmp/testsummary.txt | wc -l) <(rg 'PASS' /tmp/testsummary.txt | 
wc -l) >/dev/null; echo $?}}
{{0}}

 

{{(will leave it open until this test hangs/fails again)}}

 

> Hanging test case `testMaxLogCompactionLag`
> ---
>
> Key: KAFKA-9596
> URL: https://issues.apache.org/jira/browse/KAFKA-9596
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Agam Brahma
>Priority: Major
>
> Saw this on a recent build:
> {code}
> 15:18:59 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag STARTED
> 18:19:25 Build timed out (after 270 minutes). Marking the build as aborted.
> 18:19:25 Build was aborted
> 18:19:25 [FINDBUGS] Skipping publisher since build result is ABORTED
> 18:19:25 Recording test results
> 18:19:25 Setting MAVEN_LATEST__HOME=/home/jenkins/tools/maven/latest/
> 18:19:25 Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3
> 18:19:27 
> 18:19:27 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag SKIPPED
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9596) Hanging test case `testMaxLogCompactionLag`

2020-02-25 Thread Agam Brahma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044950#comment-17044950
 ] 

Agam Brahma commented on KAFKA-9596:


Wasn't able to repro this failure. Passed on trunk (`5216da3d`) with 100 runs:

 

{{$ for i in $(seq 1 100); do ./gradlew :core:cleanTest :core:test --tests 
}}{{"kafka.log.LogCleanerIntegrationTest.testMaxLogCompactionLag" | rg 
'testMaxLogCompactionLag'; done > /tmp/testsummary.txt}}

 

{{$ diff <(cat /tmp/testsummary.txt | wc -l) <(rg 'PASS' /tmp/testsummary.txt | 
wc -l) >/dev/null; echo $?}}
{{0}}

 

{{(will leave it open until this test hangs/fails again)}}

 

> Hanging test case `testMaxLogCompactionLag`
> ---
>
> Key: KAFKA-9596
> URL: https://issues.apache.org/jira/browse/KAFKA-9596
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Agam Brahma
>Priority: Major
>
> Saw this on a recent build:
> {code}
> 15:18:59 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag STARTED
> 18:19:25 Build timed out (after 270 minutes). Marking the build as aborted.
> 18:19:25 Build was aborted
> 18:19:25 [FINDBUGS] Skipping publisher since build result is ABORTED
> 18:19:25 Recording test results
> 18:19:25 Setting MAVEN_LATEST__HOME=/home/jenkins/tools/maven/latest/
> 18:19:25 Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3
> 18:19:27 
> 18:19:27 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag SKIPPED
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9573) TestUpgrade system test failed on Java11.

2020-02-25 Thread Nikolay Izhikov (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044913#comment-17044913
 ] 

Nikolay Izhikov commented on KAFKA-9573:


There was some merge conflicts after 
[9d53ad794de45739093070306f44df2a2f31e9e4|https://github.com/apache/kafka/commit/9d53ad794de45739093070306f44df2a2f31e9e4].

I resolve it and rerun {{upgrade_test.py}}.
Results still the same(2 fails)

{noformat}

SESSION REPORT (ALL TESTS)
ducktape version: 0.7.6
session_id:   2020-02-25--001
run time: 120 minutes 48.504 seconds
tests run:31
passed:   29
failed:   2
ignored:  0

test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.8.2.2.to_message_format_version=None.compression_types=.none
status: PASS
run time:   5 minutes 52.846 seconds

test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.8.2.2.to_message_format_version=None.compression_types=.snappy
status: PASS
run time:   3 minutes 42.862 seconds

test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=None.security_protocol=SASL_SSL.compression_types=.none
status: PASS
run time:   5 minutes 48.920 seconds

test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.0.1.to_message_format_version=None.compression_types=.lz4
status: PASS
run time:   3 minutes 52.226 seconds

test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.0.1.to_message_format_version=None.compression_types=.snappy
status: PASS
run time:   3 minutes 55.741 seconds

test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.1.1.to_message_format_version=None.compression_types=.lz4
status: PASS
run time:   3 minutes 56.117 seconds

test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.1.1.to_message_format_version=None.compression_types=.snappy
status: PASS
run time:   4 minutes 0.577 seconds

test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.2.2.to_message_format_version=0.10.2.2.compression_types=.snappy
status: PASS
run time:   3 minutes 52.603 seconds

test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.10.2.2.to_message_format_version=0.9.0.1.compression_types=.none
status: FAIL
run time:   1 minute 20.193 seconds


Kafka server didn't finish startup in 60 seconds
Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 
132, in run
data = self.run_test()
  File 
"/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 
189, in run_test
return self.test_context.function(self.test)
  File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 
428, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/core/upgrade_test.py", line 133, 
in test_upgrade
self.kafka.start()
  File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 250, in 
start
Service.start(self)
  File "/usr/local/lib/python2.7/dist-packages/ducktape/services/service.py", 
line 234, in start
self.start_node(node)
  File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 372, in 
start_node
err_msg="Kafka server didn't finish startup in %d seconds" % timeout_sec)
  File 
"/usr/local/lib/python2.7/dist-packages/ducktape/cluster/remoteaccount.py", 
line 705, in wait_until
allow_fail=True) == 0, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 
41, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) e

[jira] [Updated] (KAFKA-9047) AdminClient group operations may not respect backoff

2020-02-25 Thread Sanjana Kaundinya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjana Kaundinya updated KAFKA-9047:
-
Reviewer: Jason Gustafson

> AdminClient group operations may not respect backoff
> 
>
> Key: KAFKA-9047
> URL: https://issues.apache.org/jira/browse/KAFKA-9047
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Reporter: Jason Gustafson
>Assignee: Sanjana Kaundinya
>Priority: Major
>
> The retry logic for consumer group operations in the admin client is 
> complicated by the need to find the coordinator. Instead of simply retry 
> loops which send the same request over and over, we can get more complex 
> retry loops like the following:
>  # Send FindCoordinator to B -> Coordinator is A
>  # Send DescribeGroup to A -> NOT_COORDINATOR
>  # Go back to 1
> Currently we construct a new Call object for each step in this loop, which 
> means we lose some of retry bookkeeping such as the last retry time and the 
> number of tries. This means it is possible to have tight retry loops which 
> bounce between steps 1 and 2 and do not respect the retry backoff. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9608) An EOS model simulation test

2020-02-25 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9608:
--

 Summary: An EOS model simulation test
 Key: KAFKA-9608
 URL: https://issues.apache.org/jira/browse/KAFKA-9608
 Project: Kafka
  Issue Type: Improvement
Reporter: Boyang Chen
Assignee: Boyang Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9118) LogDirFailureHandler shouldn't use Zookeeper

2020-02-25 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044899#comment-17044899
 ] 

Boyang Chen commented on KAFKA-9118:


[~viktorsomogyi] Hey Viktor, do you have time to make progress on this ticket? 
If not, I could take a look.

> LogDirFailureHandler shouldn't use Zookeeper
> 
>
> Key: KAFKA-9118
> URL: https://issues.apache.org/jira/browse/KAFKA-9118
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Viktor Somogyi-Vass
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>
> As described in 
> [KIP-112|https://cwiki.apache.org/confluence/display/KAFKA/KIP-112%3A+Handle+disk+failure+for+JBOD#KIP-112:HandlediskfailureforJBOD-Zookeeper]:
> {noformat}
> 2. A log directory stops working on a broker during runtime
> - The controller watches the path /log_dir_event_notification for new znode.
> - The broker detects offline log directories during runtime.
> - The broker takes actions as if it has received StopReplicaRequest for this 
> replica. More specifically, the replica is no longer considered leader and is 
> removed from any replica fetcher thread. (The clients will receive a 
> UnknownTopicOrPartitionException at this point)
> - The broker notifies the controller by creating a sequential znode under 
> path /log_dir_event_notification with data of the format {"version" : 1, 
> "broker" : brokerId, "event" : LogDirFailure}.
> - The controller reads the znode to get the brokerId and finds that the event 
> type is LogDirFailure.
> - The controller deletes the notification znode
> - The controller sends LeaderAndIsrRequest to that broker to query the state 
> of all topic partitions on the broker. The LeaderAndIsrResponse from this 
> broker will specify KafkaStorageException for those partitions that are on 
> the bad log directories.
> - The controller updates the information of offline replicas in memory and 
> trigger leader election as appropriate.
> - The controller removes offline replicas from ISR in the ZK and sends 
> LeaderAndIsrRequest with updated ISR to be used by partition leaders.
> - The controller propagates the information of offline replicas to brokers by 
> sending UpdateMetadataRequest.
> {noformat}
> Instead of the notification ZNode we should use a Kafka protocol that sends a 
> notification message to the controller with the offline partitions. The 
> controller then updates the information of offline replicas in memory and 
> trigger leader election, then removes the replicas from ISR in ZK and sends a 
> LAIR and an UpdateMetadataRequest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9607) Should not clear partition group if the task will be revived again

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044886#comment-17044886
 ] 

ASF GitHub Bot commented on KAFKA-9607:
---

abbccdda commented on pull request #8168: KAFKA-9607: Partition group should 
not be cleared if task will be revived
URL: https://github.com/apache/kafka/pull/8168
 
 
   This PR fixes the illegal state bug where a task gets revived but has no 
input partition assigned anymore.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Should not clear partition group if the task will be revived again
> --
>
> Key: KAFKA-9607
> URL: https://issues.apache.org/jira/browse/KAFKA-9607
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> We detected an issue with a corrupted task failed to revive:
> {code:java}
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,137] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle 
> new assignment with:
>         New active tasks: [0_0, 3_1]
>         New standby tasks: []
>         Existing active tasks: [0_0]
>         Existing standby tasks: [] 
> (org.apache.kafka.streams.processor.internals.TaskManager)
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,138] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
>  groupId=stream-soak-test] Adding newly assigned partitions: 
> k8sName-id-repartition-1 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,138] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State 
> transition from RUNNING to PARTITIONS_ASSIGNED 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,419] WARN 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException 
> fetching records from restore consumer for partitions 
> [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it 
> is likely that the consumer's position has fallen out of the topic partition 
> offset range because the topic was truncated or compacted on the broker, 
> marking the corresponding tasks as corrupted and re-initializingit later. 
> (org.apache.kafka.streams.processor.internals.StoreChangelogReader)
> [2020-02-25T08:23:38-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,139] INFO 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
>  groupId=stream-soak-test] Setting offset for partition 
> k8sName-id-repartition-1 to the committed offset 
> FetchPosition{offset=3592242, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092
>  (id: 1003 rack: null)], epoch=absent}} 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:39-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
> 16:23:38,463] ERROR 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> stream-thread 
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
> Encountered the following exception during processing and the thread is going 
> to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00] 
> (streams-soak-trunk_soak_i-06305ad57801079ae_stre

[jira] [Created] (KAFKA-9607) Should not clear partition group if the task will be revived again

2020-02-25 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9607:
--

 Summary: Should not clear partition group if the task will be 
revived again
 Key: KAFKA-9607
 URL: https://issues.apache.org/jira/browse/KAFKA-9607
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Boyang Chen
Assignee: Boyang Chen


We detected an issue with a corrupted task failed to revive:
{code:java}
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,137] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle 
new assignment with:
        New active tasks: [0_0, 3_1]
        New standby tasks: []
        Existing active tasks: [0_0]
        Existing standby tasks: [] 
(org.apache.kafka.streams.processor.internals.TaskManager)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,138] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
 groupId=stream-soak-test] Adding newly assigned partitions: 
k8sName-id-repartition-1 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,138] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State 
transition from RUNNING to PARTITIONS_ASSIGNED 
(org.apache.kafka.streams.processor.internals.StreamThread)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,419] WARN 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException 
fetching records from restore consumer for partitions 
[stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-49-changelog-1], it is 
likely that the consumer's position has fallen out of the topic partition 
offset range because the topic was truncated or compacted on the broker, 
marking the corresponding tasks as corrupted and re-initializingit later. 
(org.apache.kafka.streams.processor.internals.StoreChangelogReader)
[2020-02-25T08:23:38-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,139] INFO 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
[Consumer 
clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
 groupId=stream-soak-test] Setting offset for partition 
k8sName-id-repartition-1 to the committed offset FetchPosition{offset=3592242, 
offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092
 (id: 1003 rack: null)], epoch=absent}} 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25 
16:23:38,463] ERROR 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
stream-thread 
[stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] 
Encountered the following exception during processing and the thread is going 
to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
[2020-02-25T08:23:39-08:00] 
(streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) 
java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found.
        at 
org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209)
        at 
org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725)
{code}
The root cause is that we accidentally cleanup the partition group map so that 
next time we reboot the task it would miss input partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9347) Detect deleted log directory before becoming leader

2020-02-25 Thread Agam Brahma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Agam Brahma reassigned KAFKA-9347:
--

Assignee: Agam Brahma

> Detect deleted log directory before becoming leader
> ---
>
> Key: KAFKA-9347
> URL: https://issues.apache.org/jira/browse/KAFKA-9347
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Agam Brahma
>Priority: Major
>  Labels: needs-discussion
>
> There is no protection currently if a broker has had its log directory 
> deleted to prevent it from becoming the leader of a partition that it still 
> remains in the ISR of. This scenario can happen when the last remaining 
> replica in the ISR is shutdown. It will remain in the ISR and be eligible for 
> leadership as soon as it starts up. It would be useful to either detect this 
> case situation dynamically in order to force the user to do an unclean 
> election or recover another broker. One option might be just to pass a flag 
> on startup to specify that a broker should not be eligible for leadership. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9603) Number of open files keeps increasing in Streams application

2020-02-25 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044774#comment-17044774
 ] 

Sophie Blee-Goldman commented on KAFKA-9603:


Hey [~biljazovic], can you tell if this happened during restoration or were 
there a large number of unclosed files long after it resumed processing?

The reason I ask is that during restoration Streams toggles on "bulk loading" 
mode which involves turning off auto-compaction and dumping all the data from 
the changelog into (uncompacted) L0 files. These are the first level files and 
tend to be relatively small, so if you end up with a lot of them. Once 
restoration is done Streams will issue a manual compaction to merge these into 
a fewer number of larger files, but of course you can hit system limits before 
it completes if you have a lot of data to restore

> Number of open files keeps increasing in Streams application
> 
>
> Key: KAFKA-9603
> URL: https://issues.apache.org/jira/browse/KAFKA-9603
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.0, 2.3.1
> Environment: Spring Boot 2.2.4, OpenJDK 13, Centos image
>Reporter: Bruno Iljazovic
>Priority: Major
>
> Problem appeared when upgrading from *2.0.1* to *2.3.1*. 
> Relevant Kafka Streams code:
> {code:java}
> KStream events1 =
> builder.stream(FIRST_TOPIC_NAME, Consumed.with(stringSerde, event1Serde, 
> event1TimestampExtractor(), null))
>.mapValues(...);
> KStream events2 =
> builder.stream(SECOND_TOPIC_NAME, Consumed.with(stringSerde, event2Serde, 
> event2TimestampExtractor(), null))
>.mapValues(...);
> var joinWindows = JoinWindows.of(Duration.of(1, MINUTES).toMillis())
>  .until(Duration.of(1, HOURS).toMillis());
> events2.join(events1, this::join, joinWindows, Joined.with(stringSerde, 
> event2Serde, event1Serde))
>.foreach(...);
> {code}
> Number of open *.sst files keeps increasing until eventually it hits the os 
> limit (65536) and causes this exception:
> {code:java}
> Caused by: org.rocksdb.RocksDBException: While open a file for appending: 
> /.../0_8/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.157943520/001354.sst:
>  Too many open files
>   at org.rocksdb.RocksDB.flush(Native Method)
>   at org.rocksdb.RocksDB.flush(RocksDB.java:2394)
> {code}
> Here are example files that are opened and never closed:
> {code:java}
> /.../0_27/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000114.sst
> /.../0_27/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.158245920/65.sst
> /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158215680/000115.sst
> /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000112.sst
> /.../0_31/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158185440/51.sst
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8147) Add changelog topic configuration to KTable suppress

2020-02-25 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-8147.
-
Resolution: Fixed

Merged as 
https://github.com/apache/kafka/commit/90640266393b530107db8256d38ec5aeba4805e1

> Add changelog topic configuration to KTable suppress
> 
>
> Key: KAFKA-8147
> URL: https://issues.apache.org/jira/browse/KAFKA-8147
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Maarten
>Assignee: highluck
>Priority: Minor
>  Labels: kip
> Fix For: 2.6.0
>
>
> The streams DSL does not provide a way to configure the changelog topic 
> created by KTable.suppress.
> From the perspective of an external user this could be implemented similar to 
> the configuration of aggregate + materialized, i.e.,
> {code:java}
> changelogTopicConfigs = // Configs
> materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs)
> ..
> KGroupedStream.aggregate(..,materialized)
> {code}
> [KIP-446: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress|https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8147) Add changelog topic configuration to KTable suppress

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044765#comment-17044765
 ] 

ASF GitHub Bot commented on KAFKA-8147:
---

vvcephei commented on pull request #8029: KAFKA-8147: Add changelog topic 
configuration to KTable suppress
URL: https://github.com/apache/kafka/pull/8029
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add changelog topic configuration to KTable suppress
> 
>
> Key: KAFKA-8147
> URL: https://issues.apache.org/jira/browse/KAFKA-8147
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Maarten
>Assignee: highluck
>Priority: Minor
>  Labels: kip
>
> The streams DSL does not provide a way to configure the changelog topic 
> created by KTable.suppress.
> From the perspective of an external user this could be implemented similar to 
> the configuration of aggregate + materialized, i.e.,
> {code:java}
> changelogTopicConfigs = // Configs
> materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs)
> ..
> KGroupedStream.aggregate(..,materialized)
> {code}
> [KIP-446: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress|https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8147) Add changelog topic configuration to KTable suppress

2020-02-25 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-8147:

Fix Version/s: 2.6.0

> Add changelog topic configuration to KTable suppress
> 
>
> Key: KAFKA-8147
> URL: https://issues.apache.org/jira/browse/KAFKA-8147
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Maarten
>Assignee: highluck
>Priority: Minor
>  Labels: kip
> Fix For: 2.6.0
>
>
> The streams DSL does not provide a way to configure the changelog topic 
> created by KTable.suppress.
> From the perspective of an external user this could be implemented similar to 
> the configuration of aggregate + materialized, i.e.,
> {code:java}
> changelogTopicConfigs = // Configs
> materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs)
> ..
> KGroupedStream.aggregate(..,materialized)
> {code}
> [KIP-446: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress|https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9605) EOS Producer could throw illegal state if trying to complete a failed batch after fatal error

2020-02-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9605:
---
Description: 
In the Producer we could see network client hits fatal exception while trying 
to complete the batches after Txn manager hits fatal fenced error:
{code:java}
 
[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,673] ERROR [kafka-producer-network-thread | 
stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
 [Producer 
clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer,
 transactionalId=stream-soak-test-1_0] Aborting producer batches due to fatal 
error (org.apache.kafka.clients.producer.internals.Sender)
[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) 
org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an 
operation with an old epoch. Either there is a newer producer with the same 
transactionalId, or the producer's transaction has been expired by the broker.
[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,674] INFO 
[stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3] 
[Producer 
clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-0_0-producer,
 transactionalId=stream-soak-test-0_0] Closing the Kafka producer with 
timeoutMillis = 9223372036854775807 ms. 
(org.apache.kafka.clients.producer.KafkaProducer)
[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,684] INFO [kafka-producer-network-thread | 
stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
 [Producer 
clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer,
 transactionalId=stream-soak-test-1_0] Resetting sequence number of batch with 
current sequence 354277 for partition windowed-node-counts-0 to 354276 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,684] INFO [kafka-producer-network-thread | 
stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
 Resetting sequence number of batch with current sequence 354277 for partition 
windowed-node-counts-0 to 354276 
(org.apache.kafka.clients.producer.internals.ProducerBatch)
[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,685] ERROR [kafka-producer-network-thread | 
stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
 [Producer 
clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer,
 transactionalId=stream-soak-test-1_0] Uncaught error in request completion: 
(org.apache.kafka.clients.NetworkClient)
[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) 
java.lang.IllegalStateException: Should not reopen a batch which is already 
aborted.
        at 
org.apache.kafka.common.record.MemoryRecordsBuilder.reopenAndRewriteProducerState(MemoryRecordsBuilder.java:295)
        at 
org.apache.kafka.clients.producer.internals.ProducerBatch.resetProducerState(ProducerBatch.java:395)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.lambda$adjustSequencesDueToFailedBatch$4(TransactionManager.java:770)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager$TopicPartitionEntry.resetSequenceNumbers(TransactionManager.java:180)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.adjustSequencesDueToFailedBatch(TransactionManager.java:760)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:735)
        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:671)
        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:662)
        at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:620)
        at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:554)
        at 
org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:69)
        at 
org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:745)
        at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
        at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:571)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:563)
        at 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:304)
        at 
org.apache.kafka.clients.producer.interna

[jira] [Assigned] (KAFKA-9129) Add Thread ID to the InternalProcessorContext

2020-02-25 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna reassigned KAFKA-9129:


Assignee: Bruno Cadonna

> Add Thread ID to the InternalProcessorContext
> -
>
> Key: KAFKA-9129
> URL: https://issues.apache.org/jira/browse/KAFKA-9129
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
>
> When we added client metrics we had to move the {{StreamsMetricsImpl}} object 
> to the client level. That means that now instead of having one 
> {{StreamsMetricsImpl}} object per thread, we have now one per client. That 
> also means that we cannot store the thread ID in the {{StreamsMetricsImpl}} 
> anymore. Currently, we get the thread ID from 
> {{Thread.currentThread().getName()}} when we need to create a sensor. 
> However, that is not robust against code refactoring because we need to 
> ensure that the thread that creates the sensor is also the one that records 
> the metrics. To be more flexible, we should expose the ID of the thread that 
> executes a processor in the {{InternalProcessorContext}} like it already 
> exposes the task ID.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9606) Document Metrics Changes from KIP-444

2020-02-25 Thread Bruno Cadonna (Jira)
Bruno Cadonna created KAFKA-9606:


 Summary: Document Metrics Changes from KIP-444
 Key: KAFKA-9606
 URL: https://issues.apache.org/jira/browse/KAFKA-9606
 Project: Kafka
  Issue Type: Task
  Components: docs, streams
Affects Versions: 2.5.0
Reporter: Bruno Cadonna
 Fix For: 2.5.0


Changes introduced in KIP-444 shall be documented. See 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%253A+Augment+metrics+for+Kafka+Streams]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9606) Document Metrics Changes from KIP-444

2020-02-25 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna reassigned KAFKA-9606:


Assignee: Bruno Cadonna

> Document Metrics Changes from KIP-444
> -
>
> Key: KAFKA-9606
> URL: https://issues.apache.org/jira/browse/KAFKA-9606
> Project: Kafka
>  Issue Type: Task
>  Components: docs, streams
>Affects Versions: 2.5.0
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 2.5.0
>
>
> Changes introduced in KIP-444 shall be documented. See 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%253A+Augment+metrics+for+Kafka+Streams]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9533) ValueTransform forwards `null` values

2020-02-25 Thread Bill Bejeck (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044725#comment-17044725
 ] 

Bill Bejeck commented on KAFKA-9533:


reverted cherry-picks to 2.5, 2.4, 2.3, and 2.2

> ValueTransform forwards `null` values
> -
>
> Key: KAFKA-9533
> URL: https://issues.apache.org/jira/browse/KAFKA-9533
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Michael Viamari
>Assignee: Michael Viamari
>Priority: Major
> Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1
>
>
> According to the documentation for `KStream#transformValues`, nulls returned 
> from `ValueTransformer#transform` are not forwarded. (see 
> [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-]
> However, this does not appear to be the case. In 
> `KStreamTransformValuesProcessor#process` the result of the transform is 
> forwarded directly.
> {code:java}
>  @Override
>  public void process(final K key, final V value) {
>  context.forward(key, valueTransformer.transform(key, value));
>  }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9533) ValueTransform forwards `null` values

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044724#comment-17044724
 ] 

ASF GitHub Bot commented on KAFKA-9533:
---

bbejeck commented on pull request #8167: KAFKA-9533: Revert  ValueTransform 
forwards `null` 
URL: https://github.com/apache/kafka/pull/8167
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ValueTransform forwards `null` values
> -
>
> Key: KAFKA-9533
> URL: https://issues.apache.org/jira/browse/KAFKA-9533
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Michael Viamari
>Assignee: Michael Viamari
>Priority: Major
> Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1
>
>
> According to the documentation for `KStream#transformValues`, nulls returned 
> from `ValueTransformer#transform` are not forwarded. (see 
> [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-]
> However, this does not appear to be the case. In 
> `KStreamTransformValuesProcessor#process` the result of the transform is 
> forwarded directly.
> {code:java}
>  @Override
>  public void process(final K key, final V value) {
>  context.forward(key, valueTransformer.transform(key, value));
>  }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9533) ValueTransform forwards `null` values

2020-02-25 Thread Michael Viamari (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044719#comment-17044719
 ] 

Michael Viamari commented on KAFKA-9533:


Ok. Thanks for the detailed explanation. I'll take a look at the changes you're 
proposing here as well.

> ValueTransform forwards `null` values
> -
>
> Key: KAFKA-9533
> URL: https://issues.apache.org/jira/browse/KAFKA-9533
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Michael Viamari
>Assignee: Michael Viamari
>Priority: Major
> Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1
>
>
> According to the documentation for `KStream#transformValues`, nulls returned 
> from `ValueTransformer#transform` are not forwarded. (see 
> [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-]
> However, this does not appear to be the case. In 
> `KStreamTransformValuesProcessor#process` the result of the transform is 
> forwarded directly.
> {code:java}
>  @Override
>  public void process(final K key, final V value) {
>  context.forward(key, valueTransformer.transform(key, value));
>  }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9605) EOS Producer could throw illegal state if trying to complete a failed batch after fatal error

2020-02-25 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9605:
---
Fix Version/s: 2.5.0
Affects Version/s: 2.5.0
   2.4.0

> EOS Producer could throw illegal state if trying to complete a failed batch 
> after fatal error
> -
>
> Key: KAFKA-9605
> URL: https://issues.apache.org/jira/browse/KAFKA-9605
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.5.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.5.0
>
>
> ```
> [2020-02-24T13:23:29-08:00] 
> (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
> 21:23:28,673] ERROR [kafka-producer-network-thread | 
> stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
>  [Producer 
> clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer,
>  transactionalId=stream-soak-test-1_0] Aborting producer batches due to fatal 
> error (org.apache.kafka.clients.producer.internals.Sender)
> [2020-02-24T13:23:29-08:00] 
> (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) 
> org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an 
> operation with an old epoch. Either there is a newer producer with the same 
> transactionalId, or the producer's transaction has been expired by the broker.
> [2020-02-24T13:23:29-08:00] 
> (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
> 21:23:28,674] INFO 
> [stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3] 
> [Producer 
> clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-0_0-producer,
>  transactionalId=stream-soak-test-0_0] Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms. 
> (org.apache.kafka.clients.producer.KafkaProducer)
> [2020-02-24T13:23:29-08:00] 
> (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
> 21:23:28,684] INFO [kafka-producer-network-thread | 
> stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
>  [Producer 
> clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer,
>  transactionalId=stream-soak-test-1_0] Resetting sequence number of batch 
> with current sequence 354277 for partition windowed-node-counts-0 to 354276 
> (org.apache.kafka.clients.producer.internals.TransactionManager)
> [2020-02-24T13:23:29-08:00] 
> (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
> 21:23:28,684] INFO [kafka-producer-network-thread | 
> stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
>  Resetting sequence number of batch with current sequence 354277 for 
> partition windowed-node-counts-0 to 354276 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> [2020-02-24T13:23:29-08:00] 
> (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
> 21:23:28,685] ERROR [kafka-producer-network-thread | 
> stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
>  [Producer 
> clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer,
>  transactionalId=stream-soak-test-1_0] Uncaught error in request completion: 
> (org.apache.kafka.clients.NetworkClient)
> [2020-02-24T13:23:29-08:00] 
> (streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) 
> java.lang.IllegalStateException: Should not reopen a batch which is already 
> aborted.
>         at 
> org.apache.kafka.common.record.MemoryRecordsBuilder.reopenAndRewriteProducerState(MemoryRecordsBuilder.java:295)
>         at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.resetProducerState(ProducerBatch.java:395)
>         at 
> org.apache.kafka.clients.producer.internals.TransactionManager.lambda$adjustSequencesDueToFailedBatch$4(TransactionManager.java:770)
>         at 
> org.apache.kafka.clients.producer.internals.TransactionManager$TopicPartitionEntry.resetSequenceNumbers(TransactionManager.java:180)
>         at 
> org.apache.kafka.clients.producer.internals.TransactionManager.adjustSequencesDueToFailedBatch(TransactionManager.java:760)
>         at 
> org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:735)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:671)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:662)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:620)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:554)
>

[jira] [Created] (KAFKA-9605) EOS Producer could throw illegal state if trying to complete a failed batch after fatal error

2020-02-25 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9605:
--

 Summary: EOS Producer could throw illegal state if trying to 
complete a failed batch after fatal error
 Key: KAFKA-9605
 URL: https://issues.apache.org/jira/browse/KAFKA-9605
 Project: Kafka
  Issue Type: Bug
Reporter: Boyang Chen
Assignee: Boyang Chen


```

[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,673] ERROR [kafka-producer-network-thread | 
stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
 [Producer 
clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer,
 transactionalId=stream-soak-test-1_0] Aborting producer batches due to fatal 
error (org.apache.kafka.clients.producer.internals.Sender)

[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) 
org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an 
operation with an old epoch. Either there is a newer producer with the same 
transactionalId, or the producer's transaction has been expired by the broker.

[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,674] INFO 
[stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3] 
[Producer 
clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-0_0-producer,
 transactionalId=stream-soak-test-0_0] Closing the Kafka producer with 
timeoutMillis = 9223372036854775807 ms. 
(org.apache.kafka.clients.producer.KafkaProducer)

[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,684] INFO [kafka-producer-network-thread | 
stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
 [Producer 
clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer,
 transactionalId=stream-soak-test-1_0] Resetting sequence number of batch with 
current sequence 354277 for partition windowed-node-counts-0 to 354276 
(org.apache.kafka.clients.producer.internals.TransactionManager)

[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,684] INFO [kafka-producer-network-thread | 
stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
 Resetting sequence number of batch with current sequence 354277 for partition 
windowed-node-counts-0 to 354276 
(org.apache.kafka.clients.producer.internals.ProducerBatch)

[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) [2020-02-24 
21:23:28,685] ERROR [kafka-producer-network-thread | 
stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer]
 [Producer 
clientId=stream-soak-test-5e16fa60-12a3-4c4f-9900-c75f7d10859f-StreamThread-3-1_0-producer,
 transactionalId=stream-soak-test-1_0] Uncaught error in request completion: 
(org.apache.kafka.clients.NetworkClient)

[2020-02-24T13:23:29-08:00] 
(streams-soak-trunk-eos_soak_i-02ea56d369c55eec2_streamslog) 
java.lang.IllegalStateException: Should not reopen a batch which is already 
aborted.

        at 
org.apache.kafka.common.record.MemoryRecordsBuilder.reopenAndRewriteProducerState(MemoryRecordsBuilder.java:295)

        at 
org.apache.kafka.clients.producer.internals.ProducerBatch.resetProducerState(ProducerBatch.java:395)

        at 
org.apache.kafka.clients.producer.internals.TransactionManager.lambda$adjustSequencesDueToFailedBatch$4(TransactionManager.java:770)

        at 
org.apache.kafka.clients.producer.internals.TransactionManager$TopicPartitionEntry.resetSequenceNumbers(TransactionManager.java:180)

        at 
org.apache.kafka.clients.producer.internals.TransactionManager.adjustSequencesDueToFailedBatch(TransactionManager.java:760)

        at 
org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:735)

        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:671)

        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:662)

        at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:620)

        at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:554)

        at 
org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:69)

        at 
org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:745)

        at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)

        at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:571)

        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:563)

        at 
org.apache.kafka.clients.producer.internals.Sender.runOnce(S

[jira] [Commented] (KAFKA-9562) Streams not making progress under heavy failures with EOS enabled on 2.5 branch

2020-02-25 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044693#comment-17044693
 ] 

Boyang Chen commented on KAFKA-9562:


We are soaking the 2.5 with/without EOS for another 2 weeks to make sure the 
stream side is in good shape.

> Streams not making progress under heavy failures with EOS enabled on 2.5 
> branch
> ---
>
> Key: KAFKA-9562
> URL: https://issues.apache.org/jira/browse/KAFKA-9562
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.5.0
>Reporter: John Roesler
>Assignee: Boyang Chen
>Priority: Blocker
> Fix For: 2.5.0
>
>
> During soak testing in preparation for the 2.5.0 release, we have discovered 
> a case in which Streams appears to stop making progress. Specifically, this 
> is a failure-resilience test in which we inject network faults separating the 
> instances from the brokers roughly every twenty minutes.
> On 2.4, Streams would obviously spend a lot of time rebalancing under this 
> scenario, but would still make progress. However, on the current 2.5 branch, 
> Streams effectively stops making progress except rarely.
> This appears to be a severe regression, so I'm filing this ticket as a 2.5.0 
> release blocker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9596) Hanging test case `testMaxLogCompactionLag`

2020-02-25 Thread Agam Brahma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Agam Brahma reassigned KAFKA-9596:
--

Assignee: Agam Brahma

> Hanging test case `testMaxLogCompactionLag`
> ---
>
> Key: KAFKA-9596
> URL: https://issues.apache.org/jira/browse/KAFKA-9596
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Agam Brahma
>Priority: Major
>
> Saw this on a recent build:
> {code}
> 15:18:59 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag STARTED
> 18:19:25 Build timed out (after 270 minutes). Marking the build as aborted.
> 18:19:25 Build was aborted
> 18:19:25 [FINDBUGS] Skipping publisher since build result is ABORTED
> 18:19:25 Recording test results
> 18:19:25 Setting MAVEN_LATEST__HOME=/home/jenkins/tools/maven/latest/
> 18:19:25 Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3
> 18:19:27 
> 18:19:27 kafka.log.LogCleanerIntegrationTest > testMaxLogCompactionLag SKIPPED
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7787) Add error specifications to KAFKA-7609

2020-02-25 Thread Tom Bentley (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044630#comment-17044630
 ] 

Tom Bentley commented on KAFKA-7787:


[~cmccabe] any feedback on my comment?

> Add error specifications to KAFKA-7609
> --
>
> Key: KAFKA-7787
> URL: https://issues.apache.org/jira/browse/KAFKA-7787
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Colin McCabe
>Assignee: Tom Bentley
>Priority: Minor
>
> In our RPC JSON, it would be nice if we could specify what versions of a 
> response could contain what errors.  See the discussion here: 
> https://github.com/apache/kafka/pull/5893#discussion_r244841051



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9308) Misses SAN after certificate creation

2020-02-25 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-9308.
---
Fix Version/s: 2.6.0
   Resolution: Fixed

> Misses SAN after certificate creation
> -
>
> Key: KAFKA-9308
> URL: https://issues.apache.org/jira/browse/KAFKA-9308
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.3.1
>Reporter: Agostino Sarubbo
>Assignee: Sönke Liebau
>Priority: Minor
> Fix For: 2.6.0
>
>
> Hello,
> I followed the documentation to use kafka with ssl, however the entire 
> 'procedure' loses at the end the specified SAN.
> To test, run (after the first keytool command and after the latest):
>  
> {code:java}
> keytool -list -v -keystore server.keystore.jks
> {code}
> Reference:
>  [http://kafka.apache.org/documentation.html#security_ssl]
>  
> {code:java}
> #!/bin/bash
> #Step 1
> keytool -keystore server.keystore.jks -alias localhost -validity 365 -keyalg 
> RSA -genkey -ext SAN=DNS:test.test.com
> #Step 2
> openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
> keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert
> keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert
> #Step 3
> keytool -keystore server.keystore.jks -alias localhost -certreq -file 
> cert-file 
> openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed 
> -days 365 -CAcreateserial -passin pass:test1234 
> keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert 
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}
>  
> In the detail, the SAN is losed after:
> {code:java}
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9308) Misses SAN after certificate creation

2020-02-25 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison reassigned KAFKA-9308:
-

Assignee: Sönke Liebau

> Misses SAN after certificate creation
> -
>
> Key: KAFKA-9308
> URL: https://issues.apache.org/jira/browse/KAFKA-9308
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.3.1
>Reporter: Agostino Sarubbo
>Assignee: Sönke Liebau
>Priority: Minor
>
> Hello,
> I followed the documentation to use kafka with ssl, however the entire 
> 'procedure' loses at the end the specified SAN.
> To test, run (after the first keytool command and after the latest):
>  
> {code:java}
> keytool -list -v -keystore server.keystore.jks
> {code}
> Reference:
>  [http://kafka.apache.org/documentation.html#security_ssl]
>  
> {code:java}
> #!/bin/bash
> #Step 1
> keytool -keystore server.keystore.jks -alias localhost -validity 365 -keyalg 
> RSA -genkey -ext SAN=DNS:test.test.com
> #Step 2
> openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
> keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert
> keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert
> #Step 3
> keytool -keystore server.keystore.jks -alias localhost -certreq -file 
> cert-file 
> openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed 
> -days 365 -CAcreateserial -passin pass:test1234 
> keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert 
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}
>  
> In the detail, the SAN is losed after:
> {code:java}
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9066) Kafka Connect JMX : source & sink task metrics missing for tasks in failed state

2020-02-25 Thread Simon Trigona (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044624#comment-17044624
 ] 

Simon Trigona commented on KAFKA-9066:
--

We're also experiencing this.

> Kafka Connect JMX : source & sink task metrics missing for tasks in failed 
> state
> 
>
> Key: KAFKA-9066
> URL: https://issues.apache.org/jira/browse/KAFKA-9066
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.1.1
>Reporter: Mikołaj Stefaniak
>Priority: Major
>
> h2. Overview
> Kafka Connect exposes various metrics via JMX. Those metrics can be exported 
> i.e. by _Prometheus JMX Exporter_ for further processing.
> One of crucial attributes is connector's *task status.*
> According to official Kafka docs, status is available as +status+ attribute 
> of following MBean:
> {quote}kafka.connect:type=connector-task-metrics,connector="\{connector}",task="\{task}"status
>  - The status of the connector task. One of 'unassigned', 'running', 
> 'paused', 'failed', or 'destroyed'.
> {quote}
> h2. Issue
> Generally +connector-task-metrics+ are exposed propery for tasks in +running+ 
> status but not exposed at all if task is +failed+.
> Failed Task *appears* properly with failed status when queried via *REST API*:
>  
> {code:java}
> $ curl -X GET -u 'user:pass' 
> http://kafka-connect.mydomain.com/connectors/customerconnector/status
> {"name":"customerconnector","connector":{"state":"RUNNING","worker_id":"kafka-connect.mydomain.com:8080"},"tasks":[{"id":0,"state":"FAILED","worker_id":"kafka-connect.mydomain.com:8080","trace":"org.apache.kafka.connect.errors.ConnectException:
>  Received DML 'DELETE FROM mysql.rds_sysinfo .."}],"type":"source"}
> $ {code}
>  
> Failed Task *doesn't appear* as bean with +connector-task-metrics+ type when 
> queried via *JMX*:
>  
> {code:java}
> $ echo "beans -d kafka.connect" | java -jar 
> target/jmxterm-1.1.0-SNAPSHOT-uber.jar -l localhost:8081 -n -v silent | grep 
> connector=customerconnector
> kafka.connect:connector=customerconnector,task=0,type=task-error-metricskafka.connect:connector=customerconnector,type=connector-metrics
> $
> {code}
> h2. Expected result
> It is expected, that bean with +connector-task-metrics+ type will appear also 
> for tasks that failed.
> Below is example of how beans are properly registered for tasks in Running 
> state:
>  
> {code:java}
> $ echo "get -b 
> kafka.connect:connector=sinkConsentSubscription-1000,task=0,type=connector-task-metrics
>  status" | java -jar target/jmxterm-1.1.0-SNAPSHOT-ube r.jar -l 
> localhost:8081 -n -v silent
> status = running;
> $
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-9533) ValueTransform forwards `null` values

2020-02-25 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler reopened KAFKA-9533:
-

> ValueTransform forwards `null` values
> -
>
> Key: KAFKA-9533
> URL: https://issues.apache.org/jira/browse/KAFKA-9533
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Michael Viamari
>Assignee: Michael Viamari
>Priority: Major
> Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1
>
>
> According to the documentation for `KStream#transformValues`, nulls returned 
> from `ValueTransformer#transform` are not forwarded. (see 
> [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-]
> However, this does not appear to be the case. In 
> `KStreamTransformValuesProcessor#process` the result of the transform is 
> forwarded directly.
> {code:java}
>  @Override
>  public void process(final K key, final V value) {
>  context.forward(key, valueTransformer.transform(key, value));
>  }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9533) ValueTransform forwards `null` values

2020-02-25 Thread John Roesler (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044602#comment-17044602
 ] 

John Roesler commented on KAFKA-9533:
-

Hi [~mviamari] ,

Since merging this fix into trunk, we've discovered some use cases that 
actually depend on the prior behavior. In retrospect, I think that we should 
fix this issue by correcting the docs, not changing the behavior.

 
I bet this started with a simple copy/paste error from the {{transform}} API, 
in which returning {{null}} (as opposed to a {{KeyValue}} does mean to drop the 
record). But in that case, it makes sense, since we cannot process a record 
with no key. The choice is between throwing an NPE and dropping the record, so 
we drop the record. But this doesn't apply to {{transformValues}}, because the 
result would have the same key as the input.
 
Really, it's murky territory either way... Since we have {{transorm}} and 
{{flatTransform}}, as well as the {{xValues}} overloads, it implies that 
{{transform}} is one-in-to-one-out and {{flatTransform}} is 
one-in-to-zero-or-more-out. In that case, if you did want to drop an input 
record, you should use a {{flatTransform}} and return an empty collection. 
There _should_ be an invariant for {{transform}} and {{transformValues}} that 
you get exactly the same number of output records as input records, whether 
they are {{null}} or otherwise.
 
Since it seems like the docs, not the behavior, are wrong, and since this is 
pre-existing behavior going back a long way in Streams (which breaks some 
users' code to change), we should go ahead and back out the PR. Sorry for the 
confusion.
 
As far as the actual fix goes, I'd be in favor of amending the docs to state 
that:
1. When {{KStream#transformValues}} returns {{null}}, there will simply be an 
output record with a {{null}} value.
2. When {{KStream#transform}} returns {{null}}, we consider that to be an 
invalid return, and we will log a warning while dropping the record (similar to 
other APIs in Streams)
3. When {{KStream#transform}} returns a {{KeyValue(key, null)}}, there will 
simply be an output record with a null value.
4. When {{KStream#flatTransform}} and {{KStream#flatTransformValues}} return 
{{null}}, we consider that to be equivalent to returning an empty collection, 
in which case, we don't forward any results, and also do not log a warning. 
(I'd also be in favor of logging a warning here for consistency, but it seems 
like overkill).
 
Accordingly, I'm going to re-open this ticket, and Bill will revert the changes 
to fix the downstream builds. Sorry again for the trouble.
-John

> ValueTransform forwards `null` values
> -
>
> Key: KAFKA-9533
> URL: https://issues.apache.org/jira/browse/KAFKA-9533
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0, 0.10.2.2, 0.11.0.3, 1.1.1, 2.0.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Michael Viamari
>Assignee: Michael Viamari
>Priority: Major
> Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1
>
>
> According to the documentation for `KStream#transformValues`, nulls returned 
> from `ValueTransformer#transform` are not forwarded. (see 
> [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-]
> However, this does not appear to be the case. In 
> `KStreamTransformValuesProcessor#process` the result of the transform is 
> forwarded directly.
> {code:java}
>  @Override
>  public void process(final K key, final V value) {
>  context.forward(key, valueTransformer.transform(key, value));
>  }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9567) Docs and system tests for ZooKeeper 3.5.7 and KIP-515

2020-02-25 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-9567.
--
Fix Version/s: 2.5.0
   Resolution: Fixed

Issue resolved by pull request 8132
[https://github.com/apache/kafka/pull/8132]

> Docs and system tests for ZooKeeper 3.5.7 and KIP-515
> -
>
> Key: KAFKA-9567
> URL: https://issues.apache.org/jira/browse/KAFKA-9567
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Ron Dagostino
>Priority: Blocker
> Fix For: 2.5.0
>
>
> These changes depend on [KIP-515: Enable ZK client to use the new TLS 
> supported 
> authentication|https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Enable+ZK+client+to+use+the+new+TLS+supported+authentication],
>  which was only added to 2.5.0.  The upgrade to ZooKeeper 3.5.7 was merged to 
> both 2.5.0 and 2.4.1 via https://issues.apache.org/jira/browse/KAFKA-9515, 
> but this change must only be merged to 2.5.0 (it will break the system tests 
> if merged to 2.4.1).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9567) Docs and system tests for ZooKeeper 3.5.7 and KIP-515

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044511#comment-17044511
 ] 

ASF GitHub Bot commented on KAFKA-9567:
---

omkreddy commented on pull request #8132: KAFKA-9567: Docs, system tests for 
ZooKeeper 3.5.7
URL: https://github.com/apache/kafka/pull/8132
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Docs and system tests for ZooKeeper 3.5.7 and KIP-515
> -
>
> Key: KAFKA-9567
> URL: https://issues.apache.org/jira/browse/KAFKA-9567
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Ron Dagostino
>Priority: Blocker
>
> These changes depend on [KIP-515: Enable ZK client to use the new TLS 
> supported 
> authentication|https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Enable+ZK+client+to+use+the+new+TLS+supported+authentication],
>  which was only added to 2.5.0.  The upgrade to ZooKeeper 3.5.7 was merged to 
> both 2.5.0 and 2.4.1 via https://issues.apache.org/jira/browse/KAFKA-9515, 
> but this change must only be merged to 2.5.0 (it will break the system tests 
> if merged to 2.4.1).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9604) Падение кластера

2020-02-25 Thread Maksim Larionov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maksim Larionov updated KAFKA-9604:
---
Description: 
Добрый день!

На одном из серверов в кластере произошло переполнение дискового пространства. 
При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в 
log.dirs. При достижении retention time сработала очистка и физический файл 
07607076.log не был найден. Брокер аварийно остановился.
 [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base 
offsets [7607076] due to retention time 60480ms breach (kafka.log.Log)
 [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 
7607076, size 131228281] for deletion. (kafka.log.Log)
 [2020-02-06 13:32:48,979] ERROR Error while deleting segments for 
ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data 
(kafka.server.LogDirFailureChannel)
 java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 Suppressed: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 -> 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted
 [2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving 
replicas in dir /data/ocswf/kafka_broker/kafka-data 
(kafka.server.ReplicaManager)
 [2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task 
'kafka-log-retention' (kafka.utils.KafkaScheduler)
 org.apache.kafka.common.errors.KafkaStorageException: Error while deleting 
segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data
 Caused by: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 Suppressed: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 -> 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted
 ...
 [2020-02-06 13:32:49,058] INFO Stopping serving logs in dir 
/data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
 [2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in 
/data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)

 

Затем аварийно остановились все остальные ноды кластера на выборах лидеров 
партиций:
 [2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making 
broker the leader for partition Topic: ocs.counter-balances; Partition: 40; 
Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir 
Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager)
 org.apache.kafka.common.errors.KafkaStorageException: Error while writing to 
checkpoint file 
/data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint
 Caused by: java.io.FileNotFoundException: 
/data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp
 (No such file or directory)
 не смогли переписать leader-epoch-checkpoint и остановились по этой причине
 [2020-02-06 13:32:53,687] INFO Stopping serving logs in dir 
/data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
 [2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in 
/data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)
 Является ли эта ситуация нормой?

 

  was:
Добрый день!

На одном из серверов в кластере произошло переполнение дискового пространства. 
При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в 
log.dirs. При достижении retention time сработала очистка и физический файл 
07607076.log не был найден. Брокер аварийно остановился.
 [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base 
offsets [7607076] due to retention time 60480ms breach (kafka.log.Log)
 [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 
7607076, size 131228281] for deletion. (kafka.log.Log)
 [2020-02-06 13:32:48,979] ERROR Error while deleting segments for 
ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data 
(kafka.server.LogDirFailureChannel)
 java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 Suppressed: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 -> 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted
 [2020-02-06 13:32:48,982] INFO [ReplicaMana

[jira] [Updated] (KAFKA-9604) Падение кластера

2020-02-25 Thread Maksim Larionov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maksim Larionov updated KAFKA-9604:
---
Description: 
Добрый день!

На одном из серверов в кластере произошло переполнение дискового пространства. 
При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в 
log.dirs. При достижении retention time сработала очистка и физический файл 
07607076.log не был найден. Брокер аварийно остановился.
 [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base 
offsets [7607076] due to retention time 60480ms breach (kafka.log.Log)
 [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 
7607076, size 131228281] for deletion. (kafka.log.Log)
 [2020-02-06 13:32:48,979] ERROR Error while deleting segments for 
ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data 
(kafka.server.LogDirFailureChannel)
 java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 Suppressed: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 -> 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted
 [2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving 
replicas in dir /data/ocswf/kafka_broker/kafka-data 
(kafka.server.ReplicaManager)
 [2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task 
'kafka-log-retention' (kafka.utils.KafkaScheduler)
 org.apache.kafka.common.errors.KafkaStorageException: Error while deleting 
segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data
 Caused by: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 Suppressed: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 -> 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted
 ...
 [2020-02-06 13:32:49,058] INFO Stopping serving logs in dir 
/data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
 [2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in 
/data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)


 Затем аварийно остановились все остальные ноды кластера на выборах лидеров 
партиций:
 [2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making 
broker the leader for partition Topic: ocs.counter-balances; Partition: 40; 
Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir 
Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager)
 org.apache.kafka.common.errors.KafkaStorageException: Error while writing to 
checkpoint file 
/data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint
 Caused by: java.io.FileNotFoundException: 
/data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp
 (No such file or directory)
 не смогли переписать leader-epoch-checkpoint и остановились по этой причине
 [2020-02-06 13:32:53,687] INFO Stopping serving logs in dir 
/data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
 [2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in 
/data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)
 Является ли эта ситуация нормой?

 

  was:
Добрый день!

На одном из серверов в кластере произошло переполнение дискового пространства. 
При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в 
log.dirs. При достижении retention time сработала очистка и физический файл 
07607076.log не был найден. Брокер аварийно остановился.
[2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base 
offsets [7607076] due to retention time 60480ms breach (kafka.log.Log)
[2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 
7607076, size 131228281] for deletion. (kafka.log.Log)
[2020-02-06 13:32:48,979] ERROR Error while deleting segments for 
ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data 
(kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
Suppressed: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 -> 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted
[2020-02-06 13:32:48,982] INFO [ReplicaManager bro

[jira] [Commented] (KAFKA-3881) Remove the replacing logic from "." to "_" in Fetcher

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044383#comment-17044383
 ] 

ASF GitHub Bot commented on KAFKA-3881:
---

tombentley commented on pull request #3407: KAFKA-3881: Remove the replacing 
logic from "." to "_" in Fetcher
URL: https://github.com/apache/kafka/pull/3407
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove the replacing logic from "." to "_" in Fetcher
> -
>
> Key: KAFKA-3881
> URL: https://issues.apache.org/jira/browse/KAFKA-3881
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, metrics
>Reporter: Guozhang Wang
>Assignee: Tom Bentley
>Priority: Major
>  Labels: newbie, patch-available
>
> The logic of replacing "." to "_" in metrics names / tags was originally 
> introduced in the core package's metrics since Graphite treats "." as 
> hierarchy separators (see KAFKA-1902); for the client metrics, it is supposed 
> that the GraphiteReported should take care of this itself rather than letting 
> Kafka metrics to special handle for it. In addition, right now only consumer 
> Fetcher had replace, but producer Sender does not have it actually.
> So we should consider removing this logic in the consumer Fetcher's metrics 
> package. NOTE that this is a public API backward incompatible change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-5554) Hilight config settings for particular common use cases

2020-02-25 Thread Tom Bentley (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Bentley resolved KAFKA-5554.

Resolution: Abandoned

> Hilight config settings for particular common use cases
> ---
>
> Key: KAFKA-5554
> URL: https://issues.apache.org/jira/browse/KAFKA-5554
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Tom Bentley
>Assignee: Tom Bentley
>Priority: Minor
>
> Judging by the sorts of questions seen on the mailling list, stack overflow 
> etc it seems common for users to assume that Kafka will default to settings 
> which won't lose messages. They start using Kafka and at some later time find 
> messages have been lost.
> While it's not our fault if users don't read the documentation, there's a lot 
> of configuration documentation to digest and it's easy for people to miss an 
> important setting.
> Therefore, I'd like to suggest that in addition to the current configuration 
> docs we add a short section highlighting those settings which pertain to 
> common use cases, such as:
> * configs to avoid lost messages
> * configs for low latency
> I'm sure some users will continue to not read the documentation, but when 
> they inevitably start asking questions it means people can respond with "have 
> you configured everything as described here?"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-6359) Work for KIP-236

2020-02-25 Thread Tom Bentley (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Bentley resolved KAFKA-6359.

Resolution: Implemented

This was addressed by KIP-455 instead.

> Work for KIP-236
> 
>
> Key: KAFKA-6359
> URL: https://issues.apache.org/jira/browse/KAFKA-6359
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Tom Bentley
>Assignee: GEORGE LI
>Priority: Minor
>
> This issue is for the work described in KIP-236.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5517) Support linking to particular configuration parameters

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044373#comment-17044373
 ] 

ASF GitHub Bot commented on KAFKA-5517:
---

tombentley commented on pull request #3436: KAFKA-5517: Add id to config HTML 
tables to allow linking
URL: https://github.com/apache/kafka/pull/3436
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support linking to particular configuration parameters
> --
>
> Key: KAFKA-5517
> URL: https://issues.apache.org/jira/browse/KAFKA-5517
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Tom Bentley
>Assignee: Tom Bentley
>Priority: Minor
>  Labels: patch-available
>
> Currently the configuration parameters are documented long tables, and it's 
> only possible to link to the heading before a particular table. When 
> discussing configuration parameters on forums it would be helpful to be able 
> to link to the particular parameter under discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5554) Hilight config settings for particular common use cases

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044371#comment-17044371
 ] 

ASF GitHub Bot commented on KAFKA-5554:
---

tombentley commented on pull request #3486: KAFKA-5554 doc config common
URL: https://github.com/apache/kafka/pull/3486
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hilight config settings for particular common use cases
> ---
>
> Key: KAFKA-5554
> URL: https://issues.apache.org/jira/browse/KAFKA-5554
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Tom Bentley
>Assignee: Tom Bentley
>Priority: Minor
>
> Judging by the sorts of questions seen on the mailling list, stack overflow 
> etc it seems common for users to assume that Kafka will default to settings 
> which won't lose messages. They start using Kafka and at some later time find 
> messages have been lost.
> While it's not our fault if users don't read the documentation, there's a lot 
> of configuration documentation to digest and it's easy for people to miss an 
> important setting.
> Therefore, I'd like to suggest that in addition to the current configuration 
> docs we add a short section highlighting those settings which pertain to 
> common use cases, such as:
> * configs to avoid lost messages
> * configs for low latency
> I'm sure some users will continue to not read the documentation, but when 
> they inevitably start asking questions it means people can respond with "have 
> you configured everything as described here?"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6359) Work for KIP-236

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044367#comment-17044367
 ] 

ASF GitHub Bot commented on KAFKA-6359:
---

tombentley commented on pull request #4330: [WIP] KAFKA-6359: KIP-236 
interruptible reassignments
URL: https://github.com/apache/kafka/pull/4330
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Work for KIP-236
> 
>
> Key: KAFKA-6359
> URL: https://issues.apache.org/jira/browse/KAFKA-6359
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Tom Bentley
>Assignee: GEORGE LI
>Priority: Minor
>
> This issue is for the work described in KIP-236.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9604) Падение кластера

2020-02-25 Thread Maksim Larionov (Jira)
Maksim Larionov created KAFKA-9604:
--

 Summary: Падение кластера
 Key: KAFKA-9604
 URL: https://issues.apache.org/jira/browse/KAFKA-9604
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.2.1
Reporter: Maksim Larionov


Добрый день!

На одном из серверов в кластере произошло переполнение дискового пространства. 
При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в 
log.dirs. При достижении retention time сработала очистка и физический файл 
07607076.log не был найден. Брокер аварийно остановился.
[2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base 
offsets [7607076] due to retention time 60480ms breach (kafka.log.Log)
[2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, 
dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 
7607076, size 131228281] for deletion. (kafka.log.Log)
[2020-02-06 13:32:48,979] ERROR Error while deleting segments for 
ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data 
(kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
Suppressed: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 -> 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted
[2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving 
replicas in dir /data/ocswf/kafka_broker/kafka-data 
(kafka.server.ReplicaManager)
[2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task 
'kafka-log-retention' (kafka.utils.KafkaScheduler)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting 
segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data
Caused by: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
Suppressed: java.nio.file.NoSuchFileException: 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log
 -> 
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/07607076.log.deleted
...
[2020-02-06 13:32:49,058] INFO Stopping serving logs in dir 
/data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
[2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in 
/data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)
Затем аварийно остановились все остальные ноды кластера на выборах лидеров 
партиций:
[2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making 
broker the leader for partition Topic: ocs.counter-balances; Partition: 40; 
Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir 
Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.KafkaStorageException: Error while writing to 
checkpoint file 
/data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint
Caused by: java.io.FileNotFoundException: 
/data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp
 (No such file or directory)
не смогли переписать leader-epoch-checkpoint и остановились по этой причине
[2020-02-06 13:32:53,687] INFO Stopping serving logs in dir 
/data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
[2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in 
/data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)
Является ли эта ситуация нормой?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9572) Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some Records

2020-02-25 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044308#comment-17044308
 ] 

Bruno Cadonna commented on KAFKA-9572:
--

Sounds good to me.

It is a pity that we cannot backport the potential fix, but I see that it would 
be really hard.  

> Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some 
> Records
> ---
>
> Key: KAFKA-9572
> URL: https://issues.apache.org/jira/browse/KAFKA-9572
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.0
>Reporter: Bruno Cadonna
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 2.5.0
>
> Attachments: 7-changelog-1.txt, data-1.txt, streams22.log, 
> streams23.log, streams30.log, sum-1.txt
>
>
> System test {{StreamsEosTest.test_failure_and_recovery}} failed due to a 
> wrongly computed aggregation under exactly-once (EOS). The specific error is:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Result verification 
> failed for ConsumerRecord(topic = sum, partition = 1, leaderEpoch = 0, offset 
> = 2805, CreateTime = 1580719595164, serialized key size = 4, serialized value 
> size = 8, headers = RecordHeaders(headers = [], isReadOnly = false), key = 
> [B@6c779568, value = [B@f381794) expected <6069,17269> but was <6069,10698>
>   at 
> org.apache.kafka.streams.tests.EosTestDriver.verifySum(EosTestDriver.java:444)
>   at 
> org.apache.kafka.streams.tests.EosTestDriver.verify(EosTestDriver.java:196)
>   at 
> org.apache.kafka.streams.tests.StreamsEosTest.main(StreamsEosTest.java:69)
> {code} 
> That means, the sum computed by the Streams app seems to be wrong for key 
> 6069. I checked the dumps of the log segments of the input topic partition 
> (attached: data-1.txt) and indeed two input records are not considered in the 
> sum. With those two missed records the sum would be correct. More concretely, 
> the input values for key 6069 are:
> # 147
> # 9250
> # 5340 
> # 1231
> # 1301
> The sum of this values is 17269 as stated in the exception above as expected 
> sum. If you subtract values 3 and 4, i.e., 5340 and 1231 from 17269, you get 
> 10698 , which is the actual sum in the exception above. Somehow those two 
> values are missing.
> In the log dump of the output topic partition (attached: sum-1.txt), the sum 
> is correct until the 4th value 1231 , i.e. 15968, then it is overwritten with 
> 10698.
> In the log dump of the changelog topic of the state store that stores the sum 
> (attached: 7-changelog-1.txt), the sum is also overwritten as in the output 
> topic.
> I attached the logs of the three Streams instances involved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9603) Number of open files keeps increasing in Streams application

2020-02-25 Thread Bruno Iljazovic (Jira)
Bruno Iljazovic created KAFKA-9603:
--

 Summary: Number of open files keeps increasing in Streams 
application
 Key: KAFKA-9603
 URL: https://issues.apache.org/jira/browse/KAFKA-9603
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.3.1, 2.4.0
 Environment: Spring Boot 2.2.4, OpenJDK 13, Centos image
Reporter: Bruno Iljazovic


Problem appeared when upgrading from *2.0.1* to *2.3.1*. 

Relevant Kafka Streams code:
{code:java}
KStream events1 =
builder.stream(FIRST_TOPIC_NAME, Consumed.with(stringSerde, event1Serde, 
event1TimestampExtractor(), null))
   .mapValues(...);

KStream events2 =
builder.stream(SECOND_TOPIC_NAME, Consumed.with(stringSerde, event2Serde, 
event2TimestampExtractor(), null))
   .mapValues(...);

var joinWindows = JoinWindows.of(Duration.of(1, MINUTES).toMillis())
 .until(Duration.of(1, HOURS).toMillis());

events2.join(events1, this::join, joinWindows, Joined.with(stringSerde, 
event2Serde, event1Serde))
   .foreach(...);
{code}
Number of open *.sst files keeps increasing until eventually it hits the os 
limit (65536) and causes this exception:
{code:java}
Caused by: org.rocksdb.RocksDBException: While open a file for appending: 
/.../0_8/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.157943520/001354.sst:
 Too many open files
at org.rocksdb.RocksDB.flush(Native Method)
at org.rocksdb.RocksDB.flush(RocksDB.java:2394)
{code}
Here are example files that are opened and never closed:
{code:java}
/.../0_27/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000114.sst
/.../0_27/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.158245920/65.sst
/.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158215680/000115.sst
/.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000112.sst
/.../0_31/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158185440/51.sst
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)