[jira] [Commented] (KAFKA-8705) NullPointerException was thrown by topology optimization when two MergeNodes have common KeyChaingingNode

2019-07-24 Thread Hiroshi Nakahara (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892339#comment-16892339
 ] 

Hiroshi Nakahara commented on KAFKA-8705:
-

[~bbejeck] Thank you for working on it quickly!  I'm happy to hear about that :)

> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode
> -
>
> Key: KAFKA-8705
> URL: https://issues.apache.org/jira/browse/KAFKA-8705
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Hiroshi Nakahara
>Assignee: Bill Bejeck
>Priority: Major
>
> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode.
> Kafka Stream version: 2.3.0
> h3. Code
> {code:java}
> import org.apache.kafka.common.serialization.Serdes;
> import org.apache.kafka.streams.StreamsBuilder;
> import org.apache.kafka.streams.StreamsConfig;
> import org.apache.kafka.streams.kstream.Consumed;
> import org.apache.kafka.streams.kstream.KStream;
> import java.util.Properties;
> public class Main {
> public static void main(String[] args) {
> final StreamsBuilder streamsBuilder = new StreamsBuilder();
> final KStream parentStream = 
> streamsBuilder.stream("parentTopic", Consumed.with(Serdes.Integer(), 
> Serdes.Integer()))
> .selectKey(Integer::sum);  // To make parentStream 
> KeyChaingingPoint
> final KStream childStream1 = 
> parentStream.mapValues(v -> v + 1);
> final KStream childStream2 = 
> parentStream.mapValues(v -> v + 2);
> final KStream childStream3 = 
> parentStream.mapValues(v -> v + 3);
> childStream1
> .merge(childStream2)
> .merge(childStream3)
> .to("outputTopic");
> final Properties properties = new Properties();
> properties.setProperty(StreamsConfig.TOPOLOGY_OPTIMIZATION, 
> StreamsConfig.OPTIMIZE);
> streamsBuilder.build(properties);
> }
> }
> {code}
> h3. Expected result
> streamsBuilder.build should create Topology without throwing Exception.  The 
> expected topology is:
> {code:java}
> Topologies:
>Sub-topology: 0
> Source: KSTREAM-SOURCE-00 (topics: [parentTopic])
>   --> KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-KEY-SELECT-01 (stores: [])
>   --> KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03, 
> KSTREAM-MAPVALUES-04
>   <-- KSTREAM-SOURCE-00
> Processor: KSTREAM-MAPVALUES-02 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-03 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-04 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MERGE-05 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03
> Processor: KSTREAM-MERGE-06 (stores: [])
>   --> KSTREAM-SINK-07
>   <-- KSTREAM-MERGE-05, KSTREAM-MAPVALUES-04
> Sink: KSTREAM-SINK-07 (topic: outputTopic)
>   <-- KSTREAM-MERGE-06
> {code}
> h3. Actual result
> NullPointerException was thrown with the following stacktrace.
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
>   at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeUpdateKeyChangingRepartitionNodeMap(InternalStreamsBuilder.java:397)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeOptimizeRepartitionOperations(InternalStreamsBuilder.java:315)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybePerformOptimizations(InternalStreamsBuilder.java:304)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.buildAndOptimizeTopology(InternalStreamsBuilder.java:275)
>   at 
> org.apache.kafka.streams.StreamsBuilder.build(StreamsBuilder.java:558)
>   at Main.main(Main.java:24){code}
> h3. Cause
> This exception occurs in 
> InternalStreamsBuilder#maybeUpdateKeyChaingingRepartitionNodeMap.
> {code:java}
> private void maybeUpdateKeyChangingRepartitionNodeMap() {
> final Map> 
> mergeNodesToKeyChangers = new HashMap<>();
> for (final StreamsGraphNode mergeNode : mergeNodes) {
> mergeNodesToKeyChangers.put(mergeNode, new LinkedHashSet<>());
> final Collection keys = 
> 

[jira] [Commented] (KAFKA-8078) Flaky Test TableTableJoinIntegrationTest#testInnerInner

2019-07-24 Thread Boyang Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892326#comment-16892326
 ] 

Boyang Chen commented on KAFKA-8078:


[https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/23646/consoleFull]
*22:05:50* org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerLeft[caching enabled = true] STARTED*22:06:32* 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest.testInnerLeft[caching
 enabled = true] failed, log available in 
/home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11@2/streams/build/reports/testOutput/org.apache.kafka.streams.integration.TableTableJoinIntegrationTest.testInnerLeft[caching
 enabled = true].test.stdout*22:06:32* *22:06:32* 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerLeft[caching enabled = true] FAILED*22:06:32* 
java.lang.AssertionError: Condition not met within timeout 15000. Never 
received expected final result.*22:06:32* at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:376)*22:06:32*  
   at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:336)*22:06:32*  
   at 
org.apache.kafka.streams.integration.AbstractJoinIntegrationTest.runTest(AbstractJoinIntegrationTest.java:255)*22:06:32*
 at 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest.testInnerLeft(TableTableJoinIntegrationTest.java:244)*22:06:32*
 *22:06:32* org.apache.kafka.streams.integration.TableTableJoinIntegrationTest 
> testOuterInner[caching enabled = true] STARTED*22:07:16* 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest.testOuterInner[caching
 enabled = true] failed, log available in 
/home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11@2/streams/build/reports/testOutput/org.apache.kafka.streams.integration.TableTableJoinIntegrationTest.testOuterInner[caching
 enabled = true].test.stdout*22:07:16* *22:07:16* 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterInner[caching enabled = true] FAILED*22:07:16* 
java.lang.AssertionError: Condition not met within timeout 15000. Never 
received expected final result.*22:07:16* at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:376)*22:07:16*  
   at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:336)*22:07:16*  
   at 
org.apache.kafka.streams.integration.AbstractJoinIntegrationTest.runTest(AbstractJoinIntegrationTest.java:255)*22:07:16*
 at 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest.testOuterInner(TableTableJoinIntegrationTest.java:482)

> Flaky Test TableTableJoinIntegrationTest#testInnerInner
> ---
>
> Key: KAFKA-8078
> URL: https://issues.apache.org/jira/browse/KAFKA-8078
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: flaky-test
> Fix For: 2.4.0
>
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3445/tests]
> {quote}java.lang.AssertionError: Condition not met within timeout 15000. 
> Never received expected final result.
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:365)
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:325)
> at 
> org.apache.kafka.streams.integration.AbstractJoinIntegrationTest.runTest(AbstractJoinIntegrationTest.java:246)
> at 
> org.apache.kafka.streams.integration.TableTableJoinIntegrationTest.testInnerInner(TableTableJoinIntegrationTest.java:196){quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-7937) Flaky Test ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup

2019-07-24 Thread Boyang Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892325#comment-16892325
 ] 

Boyang Chen commented on KAFKA-7937:


[https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/23646/consoleFull]

> Flaky Test ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup
> 
>
> Key: KAFKA-7937
> URL: https://issues.apache.org/jira/browse/KAFKA-7937
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, clients, unit tests
>Affects Versions: 2.2.0, 2.1.1, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Gwen Shapira
>Priority: Critical
> Fix For: 2.4.0
>
> Attachments: log-job6122.txt
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/19/pipeline
> {quote}kafka.admin.ResetConsumerGroupOffsetTest > 
> testResetOffsetsNotExistingGroup FAILED 
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available. at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>  at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService.resetOffsets(ConsumerGroupCommand.scala:306)
>  at 
> kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup(ResetConsumerGroupOffsetTest.scala:89)
>  Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: 
> The coordinator is not available.{quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8179) Incremental Rebalance Protocol for Kafka Consumer

2019-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892297#comment-16892297
 ] 

ASF GitHub Bot commented on KAFKA-8179:
---

ableegoldman commented on pull request #7100: KAFKA-8179: KIP-429, new 
PartitionAssignor interface
URL: https://github.com/apache/kafka/pull/7100
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incremental Rebalance Protocol for Kafka Consumer
> -
>
> Key: KAFKA-8179
> URL: https://issues.apache.org/jira/browse/KAFKA-8179
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Recently Kafka community is promoting cooperative rebalancing to mitigate the 
> pain points in the stop-the-world rebalancing protocol. This ticket is 
> created to initiate that idea at the Kafka consumer client, which will be 
> beneficial for heavy-stateful consumers such as Kafka Streams applications.
> In short, the scope of this ticket includes reducing unnecessary rebalance 
> latency due to heavy partition migration: i.e. partitions being revoked and 
> re-assigned. This would make the built-in consumer assignors (range, 
> round-robin etc) to be aware of previously assigned partitions and be sticky 
> in best-effort.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8179) Incremental Rebalance Protocol for Kafka Consumer

2019-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892296#comment-16892296
 ] 

ASF GitHub Bot commented on KAFKA-8179:
---

ableegoldman commented on pull request #7107: KAFKA-8179: 
PartitionAssignorAdapter for backwards compatibility
URL: https://github.com/apache/kafka/pull/7107
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incremental Rebalance Protocol for Kafka Consumer
> -
>
> Key: KAFKA-8179
> URL: https://issues.apache.org/jira/browse/KAFKA-8179
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Recently Kafka community is promoting cooperative rebalancing to mitigate the 
> pain points in the stop-the-world rebalancing protocol. This ticket is 
> created to initiate that idea at the Kafka consumer client, which will be 
> beneficial for heavy-stateful consumers such as Kafka Streams applications.
> In short, the scope of this ticket includes reducing unnecessary rebalance 
> latency due to heavy partition migration: i.e. partitions being revoked and 
> re-assigned. This would make the built-in consumer assignors (range, 
> round-robin etc) to be aware of previously assigned partitions and be sticky 
> in best-effort.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8703) Move PartitionAssignor to public API

2019-07-24 Thread Sophie Blee-Goldman (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-8703:
---
Description: 
Currently the PartitionAssignor, which is meant to be a pluggable interface, 
sits in the internal package. It should be part of the public API, so we are 
deprecating the old consumer.internal.PartitionAssignor in favor of a new 
consumer.ConsumerPartitionAssignor.

We also want to take the opportunity to refactor the interface a bit, so as to 
achieve an easier to evolve API moving forward

Due to the way assignors are instantiated, moving to a new 
ConsumerPartitionAssignor interface will be fully compatible for most users 
except those who have implemented the internal.PartitionAssignor (see 
KAFKA-8704)

  was:
Currently the PartitionAssignor, which is meant to be a pluggable interface, 
sits in the internal package. It should be part of the public API, so we are 
deprecating the old consumer.internal.PartitionAssignor in favor of a new 
consumer.PartitionAssignor.

We also want to take the opportunity to refactor the interface a bit, so as to 
achieve
 # Better separation of user/assignor and consumer provided metadata
 # Easier to evolve API

Due to the way assignors are instantiated, moving to a new PartitionAssignor 
interface will be fully compatible for most users except those who have 
implemented the internal.PartitionAssignor (see KAFKA-8704)


> Move PartitionAssignor to public API
> 
>
> Key: KAFKA-8703
> URL: https://issues.apache.org/jira/browse/KAFKA-8703
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients
>Reporter: Sophie Blee-Goldman
>Assignee: Sophie Blee-Goldman
>Priority: Major
>
> Currently the PartitionAssignor, which is meant to be a pluggable interface, 
> sits in the internal package. It should be part of the public API, so we are 
> deprecating the old consumer.internal.PartitionAssignor in favor of a new 
> consumer.ConsumerPartitionAssignor.
> We also want to take the opportunity to refactor the interface a bit, so as 
> to achieve an easier to evolve API moving forward
> Due to the way assignors are instantiated, moving to a new 
> ConsumerPartitionAssignor interface will be fully compatible for most users 
> except those who have implemented the internal.PartitionAssignor (see 
> KAFKA-8704)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (KAFKA-8704) Add PartitionAssignor adapter for backwards compatibility

2019-07-24 Thread Sophie Blee-Goldman (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman reassigned KAFKA-8704:
--

Assignee: Sophie Blee-Goldman

> Add PartitionAssignor adapter for backwards compatibility
> -
>
> Key: KAFKA-8704
> URL: https://issues.apache.org/jira/browse/KAFKA-8704
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients
>Reporter: Sophie Blee-Goldman
>Assignee: Sophie Blee-Goldman
>Priority: Major
>
> As part of KIP-429, we are deprecating the old 
> consumer.internal.PartitionAssignor in favor of a [new 
> consumer.PartitionAssignor|https://issues.apache.org/jira/browse/KAFKA-8703] 
> interface  that is part of the public API.
>  
> Although the old PartitionAssignor was technically part of the internal 
> package, some users may have implemented it and this change will break source 
> compatibility for them as they would need to modify their class to implement 
> the new interface. The number of users affected may be small, but nonetheless 
> we would like to add an adapter to maintain compatibility for these users.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8179) Incremental Rebalance Protocol for Kafka Consumer

2019-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892294#comment-16892294
 ] 

ASF GitHub Bot commented on KAFKA-8179:
---

ableegoldman commented on pull request #7110:  KAFKA-8179: 
PartitionAssignorAdapter
URL: https://github.com/apache/kafka/pull/7110
 
 
   Follow up to [new PartitionAssignor 
interface](https://issues.apache.org/jira/browse/KAFKA-8703) -- should be 
rebased after [7108](https://github.com/apache/kafka/pull/7108) is merged
   
   Adds a PartitionAssignorAdapter class to [maintain backwards 
compatibility](https://issues.apache.org/jira/browse/KAFKA-8704)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incremental Rebalance Protocol for Kafka Consumer
> -
>
> Key: KAFKA-8179
> URL: https://issues.apache.org/jira/browse/KAFKA-8179
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Recently Kafka community is promoting cooperative rebalancing to mitigate the 
> pain points in the stop-the-world rebalancing protocol. This ticket is 
> created to initiate that idea at the Kafka consumer client, which will be 
> beneficial for heavy-stateful consumers such as Kafka Streams applications.
> In short, the scope of this ticket includes reducing unnecessary rebalance 
> latency due to heavy partition migration: i.e. partitions being revoked and 
> re-assigned. This would make the built-in consumer assignors (range, 
> round-robin etc) to be aware of previously assigned partitions and be sticky 
> in best-effort.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8712) Build Kafka Streams against Scala 2.13

2019-07-24 Thread Loic DIVAD (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Loic DIVAD updated KAFKA-8712:
--
Description: Since Scala 2.13 has been released and KAFKA-7197 brings its 
support, the library \{{kafka-streams-scala}} could be also compiled for Scala 
2.13.   (was: Since Scala 2.13 has been released and KAFKA-7197 brings its 
support, the library {{kafka-streams-scala }}could be also compiled for Scala 
2.13. )

> Build Kafka Streams against Scala 2.13
> --
>
> Key: KAFKA-8712
> URL: https://issues.apache.org/jira/browse/KAFKA-8712
> Project: Kafka
>  Issue Type: Wish
>  Components: streams
>Reporter: Loic DIVAD
>Priority: Minor
> Fix For: 2.4.0
>
>
> Since Scala 2.13 has been released and KAFKA-7197 brings its support, the 
> library \{{kafka-streams-scala}} could be also compiled for Scala 2.13. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8712) Build Kafka Streams against Scala 2.13

2019-07-24 Thread Ismael Juma (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892261#comment-16892261
 ] 

Ismael Juma commented on KAFKA-8712:


That's my expectation and hence why I was asking the question. There could be a 
bug somewhere.

> Build Kafka Streams against Scala 2.13
> --
>
> Key: KAFKA-8712
> URL: https://issues.apache.org/jira/browse/KAFKA-8712
> Project: Kafka
>  Issue Type: Wish
>  Components: streams
>Reporter: Loic DIVAD
>Priority: Minor
> Fix For: 2.4.0
>
>
> Since Scala 2.13 has been released and KAFKA-7197 brings its support, the 
> library {{kafka-streams-scala }}could be also compiled for Scala 2.13. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8712) Build Kafka Streams against Scala 2.13

2019-07-24 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892260#comment-16892260
 ] 

Matthias J. Sax commented on KAFKA-8712:


[~ijuma] Are you saying, that we don't need to do anything and this request is 
covered via KAFKA-7197 already?

> Build Kafka Streams against Scala 2.13
> --
>
> Key: KAFKA-8712
> URL: https://issues.apache.org/jira/browse/KAFKA-8712
> Project: Kafka
>  Issue Type: Wish
>  Components: streams
>Reporter: Loic DIVAD
>Priority: Minor
> Fix For: 2.4.0
>
>
> Since Scala 2.13 has been released and KAFKA-7197 brings its support, the 
> library {{kafka-streams-scala }}could be also compiled for Scala 2.13. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8712) Build Kafka Streams against Scala 2.13

2019-07-24 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8712:
---
Fix Version/s: 2.4.0

> Build Kafka Streams against Scala 2.13
> --
>
> Key: KAFKA-8712
> URL: https://issues.apache.org/jira/browse/KAFKA-8712
> Project: Kafka
>  Issue Type: Wish
>  Components: streams
>Reporter: Loic DIVAD
>Priority: Minor
> Fix For: 2.4.0
>
>
> Since Scala 2.13 has been released and KAFKA-7197 brings its support, the 
> library {{kafka-streams-scala }}could be also compiled for Scala 2.13. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8712) Build Kafka Streams against Scala 2.13

2019-07-24 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8712:
---
Affects Version/s: (was: 2.3.0)

> Build Kafka Streams against Scala 2.13
> --
>
> Key: KAFKA-8712
> URL: https://issues.apache.org/jira/browse/KAFKA-8712
> Project: Kafka
>  Issue Type: Wish
>  Components: streams
>Reporter: Loic DIVAD
>Priority: Minor
>
> Since Scala 2.13 has been released and KAFKA-7197 brings its support, the 
> library {{kafka-streams-scala }}could be also compiled for Scala 2.13. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8712) Build Kafka Streams against Scala 2.13

2019-07-24 Thread Ismael Juma (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892257#comment-16892257
 ] 

Ismael Juma commented on KAFKA-8712:


It should be. Do you have reason to believe it's not?

> Build Kafka Streams against Scala 2.13
> --
>
> Key: KAFKA-8712
> URL: https://issues.apache.org/jira/browse/KAFKA-8712
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Loic DIVAD
>Priority: Minor
>
> Since Scala 2.13 has been released and KAFKA-7197 brings its support, the 
> library {{kafka-streams-scala }}could be also compiled for Scala 2.13. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8712) Build Kafka Streams against Scala 2.13

2019-07-24 Thread Loic DIVAD (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Loic DIVAD updated KAFKA-8712:
--
Issue Type: Wish  (was: New Feature)

> Build Kafka Streams against Scala 2.13
> --
>
> Key: KAFKA-8712
> URL: https://issues.apache.org/jira/browse/KAFKA-8712
> Project: Kafka
>  Issue Type: Wish
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Loic DIVAD
>Priority: Minor
>
> Since Scala 2.13 has been released and KAFKA-7197 brings its support, the 
> library {{kafka-streams-scala }}could be also compiled for Scala 2.13. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8712) Build Kafka Streams against Scala 2.13

2019-07-24 Thread Loic DIVAD (JIRA)
Loic DIVAD created KAFKA-8712:
-

 Summary: Build Kafka Streams against Scala 2.13
 Key: KAFKA-8712
 URL: https://issues.apache.org/jira/browse/KAFKA-8712
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Affects Versions: 2.3.0
Reporter: Loic DIVAD


Since Scala 2.13 has been released and KAFKA-7197 brings its support, the 
library {{kafka-streams-scala }}could be also compiled for Scala 2.13. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8705) NullPointerException was thrown by topology optimization when two MergeNodes have common KeyChaingingNode

2019-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892228#comment-16892228
 ] 

ASF GitHub Bot commented on KAFKA-8705:
---

bbejeck commented on pull request #7109: KAFKA-8705: Remove parent node after 
leaving loop to prevent NPE
URL: https://github.com/apache/kafka/pull/7109
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode
> -
>
> Key: KAFKA-8705
> URL: https://issues.apache.org/jira/browse/KAFKA-8705
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Hiroshi Nakahara
>Assignee: Bill Bejeck
>Priority: Major
>
> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode.
> Kafka Stream version: 2.3.0
> h3. Code
> {code:java}
> import org.apache.kafka.common.serialization.Serdes;
> import org.apache.kafka.streams.StreamsBuilder;
> import org.apache.kafka.streams.StreamsConfig;
> import org.apache.kafka.streams.kstream.Consumed;
> import org.apache.kafka.streams.kstream.KStream;
> import java.util.Properties;
> public class Main {
> public static void main(String[] args) {
> final StreamsBuilder streamsBuilder = new StreamsBuilder();
> final KStream parentStream = 
> streamsBuilder.stream("parentTopic", Consumed.with(Serdes.Integer(), 
> Serdes.Integer()))
> .selectKey(Integer::sum);  // To make parentStream 
> KeyChaingingPoint
> final KStream childStream1 = 
> parentStream.mapValues(v -> v + 1);
> final KStream childStream2 = 
> parentStream.mapValues(v -> v + 2);
> final KStream childStream3 = 
> parentStream.mapValues(v -> v + 3);
> childStream1
> .merge(childStream2)
> .merge(childStream3)
> .to("outputTopic");
> final Properties properties = new Properties();
> properties.setProperty(StreamsConfig.TOPOLOGY_OPTIMIZATION, 
> StreamsConfig.OPTIMIZE);
> streamsBuilder.build(properties);
> }
> }
> {code}
> h3. Expected result
> streamsBuilder.build should create Topology without throwing Exception.  The 
> expected topology is:
> {code:java}
> Topologies:
>Sub-topology: 0
> Source: KSTREAM-SOURCE-00 (topics: [parentTopic])
>   --> KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-KEY-SELECT-01 (stores: [])
>   --> KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03, 
> KSTREAM-MAPVALUES-04
>   <-- KSTREAM-SOURCE-00
> Processor: KSTREAM-MAPVALUES-02 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-03 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-04 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MERGE-05 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03
> Processor: KSTREAM-MERGE-06 (stores: [])
>   --> KSTREAM-SINK-07
>   <-- KSTREAM-MERGE-05, KSTREAM-MAPVALUES-04
> Sink: KSTREAM-SINK-07 (topic: outputTopic)
>   <-- KSTREAM-MERGE-06
> {code}
> h3. Actual result
> NullPointerException was thrown with the following stacktrace.
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
>   at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeUpdateKeyChangingRepartitionNodeMap(InternalStreamsBuilder.java:397)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeOptimizeRepartitionOperations(InternalStreamsBuilder.java:315)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybePerformOptimizations(InternalStreamsBuilder.java:304)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.buildAndOptimizeTopology(InternalStreamsBuilder.java:275)
>   at 
> org.apache.kafka.streams.StreamsBuilder.build(StreamsBuilder.java:558)
>   at Main.main(Main.java:24){code}
> h3. Cause
> This exception occurs in 
> 

[jira] [Commented] (KAFKA-8705) NullPointerException was thrown by topology optimization when two MergeNodes have common KeyChaingingNode

2019-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892227#comment-16892227
 ] 

ASF GitHub Bot commented on KAFKA-8705:
---

bbejeck commented on pull request #7109: KAFKA-8705: Remove parent node after 
leaving loop to prevent NPE
URL: https://github.com/apache/kafka/pull/7109
 
 
   Fixes case where multiple children merged from a key-changing node causes an 
NPE.
   
   I've updated the tests.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode
> -
>
> Key: KAFKA-8705
> URL: https://issues.apache.org/jira/browse/KAFKA-8705
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Hiroshi Nakahara
>Assignee: Bill Bejeck
>Priority: Major
>
> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode.
> Kafka Stream version: 2.3.0
> h3. Code
> {code:java}
> import org.apache.kafka.common.serialization.Serdes;
> import org.apache.kafka.streams.StreamsBuilder;
> import org.apache.kafka.streams.StreamsConfig;
> import org.apache.kafka.streams.kstream.Consumed;
> import org.apache.kafka.streams.kstream.KStream;
> import java.util.Properties;
> public class Main {
> public static void main(String[] args) {
> final StreamsBuilder streamsBuilder = new StreamsBuilder();
> final KStream parentStream = 
> streamsBuilder.stream("parentTopic", Consumed.with(Serdes.Integer(), 
> Serdes.Integer()))
> .selectKey(Integer::sum);  // To make parentStream 
> KeyChaingingPoint
> final KStream childStream1 = 
> parentStream.mapValues(v -> v + 1);
> final KStream childStream2 = 
> parentStream.mapValues(v -> v + 2);
> final KStream childStream3 = 
> parentStream.mapValues(v -> v + 3);
> childStream1
> .merge(childStream2)
> .merge(childStream3)
> .to("outputTopic");
> final Properties properties = new Properties();
> properties.setProperty(StreamsConfig.TOPOLOGY_OPTIMIZATION, 
> StreamsConfig.OPTIMIZE);
> streamsBuilder.build(properties);
> }
> }
> {code}
> h3. Expected result
> streamsBuilder.build should create Topology without throwing Exception.  The 
> expected topology is:
> {code:java}
> Topologies:
>Sub-topology: 0
> Source: KSTREAM-SOURCE-00 (topics: [parentTopic])
>   --> KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-KEY-SELECT-01 (stores: [])
>   --> KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03, 
> KSTREAM-MAPVALUES-04
>   <-- KSTREAM-SOURCE-00
> Processor: KSTREAM-MAPVALUES-02 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-03 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-04 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MERGE-05 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03
> Processor: KSTREAM-MERGE-06 (stores: [])
>   --> KSTREAM-SINK-07
>   <-- KSTREAM-MERGE-05, KSTREAM-MAPVALUES-04
> Sink: KSTREAM-SINK-07 (topic: outputTopic)
>   <-- KSTREAM-MERGE-06
> {code}
> h3. Actual result
> NullPointerException was thrown with the following stacktrace.
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
>   at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeUpdateKeyChangingRepartitionNodeMap(InternalStreamsBuilder.java:397)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeOptimizeRepartitionOperations(InternalStreamsBuilder.java:315)
>   at 
> 

[jira] [Commented] (KAFKA-8179) Incremental Rebalance Protocol for Kafka Consumer

2019-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892184#comment-16892184
 ] 

ASF GitHub Bot commented on KAFKA-8179:
---

ableegoldman commented on pull request #7108: KAFKA-8179: add public 
ConsumerPartitionAssignor interface
URL: https://github.com/apache/kafka/pull/7108
 
 
   [Subtask JIRA](https://issues.apache.org/jira/browse/KAFKA-8703)
   Main changes of this PR
   * Deprecate old consumer.internal.PartitionAssignor and add public 
consumer.ConsumerPartitionAssignor with all OOTB assignors migrated to new 
interface
   * Refactor assignor's assignment/subscription related classes for easier to 
evolve API
   * Removed version number from classes as it is only needed for 
serialization/deserialization
   
   Other previously-discussed cleanup included in this PR:
   * Remove Assignment.error added in [pt 
1](https://github.com/apache/kafka/pull/6528/files)
   * Remove ConsumerCoordinator#adjustAssignment added in [pt 
2](https://github.com/apache/kafka/pull/6778/)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incremental Rebalance Protocol for Kafka Consumer
> -
>
> Key: KAFKA-8179
> URL: https://issues.apache.org/jira/browse/KAFKA-8179
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Recently Kafka community is promoting cooperative rebalancing to mitigate the 
> pain points in the stop-the-world rebalancing protocol. This ticket is 
> created to initiate that idea at the Kafka consumer client, which will be 
> beneficial for heavy-stateful consumers such as Kafka Streams applications.
> In short, the scope of this ticket includes reducing unnecessary rebalance 
> latency due to heavy partition migration: i.e. partitions being revoked and 
> re-assigned. This would make the built-in consumer assignors (range, 
> round-robin etc) to be aware of previously assigned partitions and be sticky 
> in best-effort.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8673) Kafka stream threads stuck while sending offsets to transaction preventing join group from completing

2019-07-24 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892132#comment-16892132
 ] 

Guozhang Wang commented on KAFKA-8673:
--

Also cc [~bob-barrett] [~hachikuji] who has been looking to KIP-360 recently: I 
will dig deeper into the producer's transactional manager but so far I think 
the infinite re-try logic on `TxnRequestHandler` is good, so this is a mystery 
why the request did not go through after broker's available.

> Kafka stream threads stuck while sending offsets to transaction preventing 
> join group from completing
> -
>
> Key: KAFKA-8673
> URL: https://issues.apache.org/jira/browse/KAFKA-8673
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, streams
>Affects Versions: 2.2.0
>Reporter: Varsha Abhinandan
>Priority: Major
> Attachments: Screen Shot 2019-07-11 at 12.08.09 PM.png
>
>
> We observed a deadlock kind of a situation in our Kafka streams application 
> when we accidentally shut down all the brokers. The Kafka cluster was brought 
> back in about an hour. 
> Observations made :
>  # Normal Kafka producers and consumers started working fine after the 
> brokers were up again. 
>  # The Kafka streams applications were stuck in the "rebalancing" state.
>  # The Kafka streams apps have exactly-once semantics enabled.
>  # The stack trace showed most of the stream threads sending the join group 
> requests to the group co-ordinator
>  # Few stream threads couldn't initiate the join group request since the call 
> to 
> [org.apache.kafka.clients.producer.KafkaProducer#sendOffsetsToTransaction|https://jira.corp.appdynamics.com/browse/ANLYTCS_ES-2062#sendOffsetsToTransaction%20which%20was%20hung]
>  was stuck.
>  # Seems like the join group requests were getting parked at the coordinator 
> since the expected members hadn't sent their own group join requests
>  # And after the timeout, the stream threads that were not stuck sent a new 
> join group requests.  
>  # Maybe (6) and (7) is happening infinitely
>  # Sample values of the GroupMetadata object on the group co-ordinator - 
> [^Screen Shot 2019-07-11 at 12.08.09 PM.png]
>  # The list of notYetJoinedMembers client id's matched with the threads 
> waiting for their offsets to be committed. 
> {code:java}
> [List(MemberMetadata(memberId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-38-consumer-efa41349-3da1-43b6-9710-a662f68c63b1,
>  
> clientId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-38-consumer,
>  clientHost=/10.136.98.48, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36-consumer-7cc8e41b-ad98-4006-a18a-b22abe6350f4,
>  
> clientId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36-consumer,
>  clientHost=/10.136.103.148, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-27-consumer-9ffb96c1-3379-4cbd-bee1-5d4719fe6c9d,
>  
> clientId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-27-consumer,
>  clientHost=/10.136.98.48, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21-consumer-5b8a1f1f-84dd-4a87-86c8-7542c0e50d1f,
>  
> clientId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21-consumer,
>  clientHost=/10.136.103.148, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33-consumer-3cb67ec9-c548-4386-962d-64d9772bf719,
>  
> clientId=metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33-consumer,
>  clientHost=/10.136.99.15, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ))]
> vabhinandan-mac:mp-jstack varsha.abhinandan$ cat jstack.* | grep 
> "metric-extractor-stream-c1-" | grep "StreamThread-" | grep "waiting on 
> condition"
> "metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36"
>  #128 daemon prio=5 os_prio=0 tid=0x7fc53c047800 nid=0xac waiting on 
> condition [0x7fc4e68e7000]
> "metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21"
>  #93 daemon prio=5 os_prio=0 tid=0x7fc53c2b5800 nid=0x9d waiting on 
> condition [0x7fc4e77f6000]
> 

[jira] [Commented] (KAFKA-8673) Kafka stream threads stuck while sending offsets to transaction preventing join group from completing

2019-07-24 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892129#comment-16892129
 ] 

Guozhang Wang commented on KAFKA-8673:
--

Yes, but my main point is that the `AddOffsetsToTxn` should not be blocked 
forever once broker is back up online again, and therefore those threads would 
eventually be joining group again and unblock the others.

> Kafka stream threads stuck while sending offsets to transaction preventing 
> join group from completing
> -
>
> Key: KAFKA-8673
> URL: https://issues.apache.org/jira/browse/KAFKA-8673
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, streams
>Affects Versions: 2.2.0
>Reporter: Varsha Abhinandan
>Priority: Major
> Attachments: Screen Shot 2019-07-11 at 12.08.09 PM.png
>
>
> We observed a deadlock kind of a situation in our Kafka streams application 
> when we accidentally shut down all the brokers. The Kafka cluster was brought 
> back in about an hour. 
> Observations made :
>  # Normal Kafka producers and consumers started working fine after the 
> brokers were up again. 
>  # The Kafka streams applications were stuck in the "rebalancing" state.
>  # The Kafka streams apps have exactly-once semantics enabled.
>  # The stack trace showed most of the stream threads sending the join group 
> requests to the group co-ordinator
>  # Few stream threads couldn't initiate the join group request since the call 
> to 
> [org.apache.kafka.clients.producer.KafkaProducer#sendOffsetsToTransaction|https://jira.corp.appdynamics.com/browse/ANLYTCS_ES-2062#sendOffsetsToTransaction%20which%20was%20hung]
>  was stuck.
>  # Seems like the join group requests were getting parked at the coordinator 
> since the expected members hadn't sent their own group join requests
>  # And after the timeout, the stream threads that were not stuck sent a new 
> join group requests.  
>  # Maybe (6) and (7) is happening infinitely
>  # Sample values of the GroupMetadata object on the group co-ordinator - 
> [^Screen Shot 2019-07-11 at 12.08.09 PM.png]
>  # The list of notYetJoinedMembers client id's matched with the threads 
> waiting for their offsets to be committed. 
> {code:java}
> [List(MemberMetadata(memberId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-38-consumer-efa41349-3da1-43b6-9710-a662f68c63b1,
>  
> clientId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-38-consumer,
>  clientHost=/10.136.98.48, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36-consumer-7cc8e41b-ad98-4006-a18a-b22abe6350f4,
>  
> clientId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36-consumer,
>  clientHost=/10.136.103.148, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-27-consumer-9ffb96c1-3379-4cbd-bee1-5d4719fe6c9d,
>  
> clientId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-27-consumer,
>  clientHost=/10.136.98.48, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21-consumer-5b8a1f1f-84dd-4a87-86c8-7542c0e50d1f,
>  
> clientId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21-consumer,
>  clientHost=/10.136.103.148, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33-consumer-3cb67ec9-c548-4386-962d-64d9772bf719,
>  
> clientId=metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33-consumer,
>  clientHost=/10.136.99.15, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ))]
> vabhinandan-mac:mp-jstack varsha.abhinandan$ cat jstack.* | grep 
> "metric-extractor-stream-c1-" | grep "StreamThread-" | grep "waiting on 
> condition"
> "metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36"
>  #128 daemon prio=5 os_prio=0 tid=0x7fc53c047800 nid=0xac waiting on 
> condition [0x7fc4e68e7000]
> "metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21"
>  #93 daemon prio=5 os_prio=0 tid=0x7fc53c2b5800 nid=0x9d waiting on 
> condition [0x7fc4e77f6000]
> "metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33"
>  #125 

[jira] [Commented] (KAFKA-8711) Kafka 2.3.0 Transient Unit Test Failures SocketServerTest. testControlPlaneRequest

2019-07-24 Thread Chandrasekhar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892118#comment-16892118
 ] 

Chandrasekhar commented on KAFKA-8711:
--

Attached find the logs...Please let us know if we can comment this test case 
for this build and move forward until there is a release available with this 
fix. If so where is this code in repo and how should we run the subset of tc's 
rather than waiting ~2hrs to find the failures...

 

Thanks Much

 

Chandra

[^KafkaAUTFailures07242019_PASS2.txt]!KafkaUTFailures07242019_PASS2.GIF!

> Kafka 2.3.0 Transient Unit Test Failures SocketServerTest. 
> testControlPlaneRequest
> --
>
> Key: KAFKA-8711
> URL: https://issues.apache.org/jira/browse/KAFKA-8711
> Project: Kafka
>  Issue Type: Test
>  Components: core, unit tests
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Critical
> Attachments: KafkaAUTFailures07242019_PASS2.txt, 
> KafkaUTFailures07242019_PASS2.GIF
>
>
> Cloned Kakfa 2.3.0 source to our git repo and compiled it using 'gradle 
> build', we see the following error consistently:
> Gradle Version 4.7
>  
> testControlPlaneRequest
> java.net.BindException: Address already in use (Bind failed)
>     at java.net.PlainSocketImpl.socketBind(Native Method)
>     at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>     at java.net.Socket.bind(Socket.java:644)
>     at java.net.Socket.(Socket.java:433)
>     at java.net.Socket.(Socket.java:286)
>     at kafka.network.SocketServerTest.connect(SocketServerTest.scala:140)
>     at 
> kafka.network.SocketServerTest.$anonfun$testControlPlaneRequest$1(SocketServerTest.scala:200)
>     at 
> kafka.network.SocketServerTest.$anonfun$testControlPlaneRequest$1$adapted(SocketServerTest.scala:199)
>     at 
> kafka.network.SocketServerTest.withTestableServer(SocketServerTest.scala:1141)
>     at 
> kafka.network.SocketServerTest.testControlPlaneRequest(SocketServerTest.scala:199)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>     at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>     at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>     at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>     at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>     at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>     at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at 

[jira] [Updated] (KAFKA-8711) Kafka 2.3.0 Transient Unit Test Failures SocketServerTest. testControlPlaneRequest

2019-07-24 Thread Chandrasekhar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar updated KAFKA-8711:
-
Attachment: KafkaUTFailures07242019_PASS2.GIF

> Kafka 2.3.0 Transient Unit Test Failures SocketServerTest. 
> testControlPlaneRequest
> --
>
> Key: KAFKA-8711
> URL: https://issues.apache.org/jira/browse/KAFKA-8711
> Project: Kafka
>  Issue Type: Test
>  Components: core, unit tests
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Critical
> Attachments: KafkaAUTFailures07242019_PASS2.txt, 
> KafkaUTFailures07242019_PASS2.GIF
>
>
> Cloned Kakfa 2.3.0 source to our git repo and compiled it using 'gradle 
> build', we see the following error consistently:
> Gradle Version 4.7
>  
> testControlPlaneRequest
> java.net.BindException: Address already in use (Bind failed)
>     at java.net.PlainSocketImpl.socketBind(Native Method)
>     at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>     at java.net.Socket.bind(Socket.java:644)
>     at java.net.Socket.(Socket.java:433)
>     at java.net.Socket.(Socket.java:286)
>     at kafka.network.SocketServerTest.connect(SocketServerTest.scala:140)
>     at 
> kafka.network.SocketServerTest.$anonfun$testControlPlaneRequest$1(SocketServerTest.scala:200)
>     at 
> kafka.network.SocketServerTest.$anonfun$testControlPlaneRequest$1$adapted(SocketServerTest.scala:199)
>     at 
> kafka.network.SocketServerTest.withTestableServer(SocketServerTest.scala:1141)
>     at 
> kafka.network.SocketServerTest.testControlPlaneRequest(SocketServerTest.scala:199)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>     at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>     at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>     at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>     at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>     at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>     at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>     at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>     at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>     at 
> 

[jira] [Created] (KAFKA-8711) Kafka 2.3.0 Transient Unit Test Failures SocketServerTest. testControlPlaneRequest

2019-07-24 Thread Chandrasekhar (JIRA)
Chandrasekhar created KAFKA-8711:


 Summary: Kafka 2.3.0 Transient Unit Test Failures 
SocketServerTest. testControlPlaneRequest
 Key: KAFKA-8711
 URL: https://issues.apache.org/jira/browse/KAFKA-8711
 Project: Kafka
  Issue Type: Test
  Components: core, unit tests
Affects Versions: 2.3.0
Reporter: Chandrasekhar


Cloned Kakfa 2.3.0 source to our git repo and compiled it using 'gradle build', 
we see the following error consistently:

Gradle Version 4.7

 

testControlPlaneRequest
java.net.BindException: Address already in use (Bind failed)
    at java.net.PlainSocketImpl.socketBind(Native Method)
    at 
java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
    at java.net.Socket.bind(Socket.java:644)
    at java.net.Socket.(Socket.java:433)
    at java.net.Socket.(Socket.java:286)
    at kafka.network.SocketServerTest.connect(SocketServerTest.scala:140)
    at 
kafka.network.SocketServerTest.$anonfun$testControlPlaneRequest$1(SocketServerTest.scala:200)
    at 
kafka.network.SocketServerTest.$anonfun$testControlPlaneRequest$1$adapted(SocketServerTest.scala:199)
    at 
kafka.network.SocketServerTest.withTestableServer(SocketServerTest.scala:1141)
    at 
kafka.network.SocketServerTest.testControlPlaneRequest(SocketServerTest.scala:199)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
    at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
    at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
    at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
    at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
    at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
    at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
    at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
    at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
    at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
    at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
    at com.sun.proxy.$Proxy1.processTestClass(Unknown Source)
    at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
    at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 

[jira] [Updated] (KAFKA-8706) Kafka 2.3.0 Transient Unit Test Failures on Oracle Linux - See attached for details

2019-07-24 Thread Chandrasekhar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar updated KAFKA-8706:
-
Priority: Major  (was: Minor)
 Summary: Kafka 2.3.0 Transient Unit Test Failures on Oracle Linux - See 
attached for details  (was: Kafka 2.3.0 Unit Test Failures on Oracle Linux  - 
Need help debugging framework or issue.)

> Kafka 2.3.0 Transient Unit Test Failures on Oracle Linux - See attached for 
> details
> ---
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Major
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt, 
> KafkaAUTFailures07242019_PASS2.txt, KafkaUTFailures07242019.GIF, 
> KafkaUTFailures07242019_PASS2.GIF
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Chandrasekhar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar updated KAFKA-8706:
-
Component/s: unit tests

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt, 
> KafkaAUTFailures07242019_PASS2.txt, KafkaUTFailures07242019.GIF, 
> KafkaUTFailures07242019_PASS2.GIF
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8673) Kafka stream threads stuck while sending offsets to transaction preventing join group from completing

2019-07-24 Thread Boyang Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892075#comment-16892075
 ] 

Boyang Chen commented on KAFKA-8673:


[~guozhang] Just to clarify, the `AddOffsetsToTxn` request should be sent to 
txn coordinator, while `TxnOffsetCommit` is sent to group coordinator which 
could indeed block join group if hanging.

> Kafka stream threads stuck while sending offsets to transaction preventing 
> join group from completing
> -
>
> Key: KAFKA-8673
> URL: https://issues.apache.org/jira/browse/KAFKA-8673
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, streams
>Affects Versions: 2.2.0
>Reporter: Varsha Abhinandan
>Priority: Major
> Attachments: Screen Shot 2019-07-11 at 12.08.09 PM.png
>
>
> We observed a deadlock kind of a situation in our Kafka streams application 
> when we accidentally shut down all the brokers. The Kafka cluster was brought 
> back in about an hour. 
> Observations made :
>  # Normal Kafka producers and consumers started working fine after the 
> brokers were up again. 
>  # The Kafka streams applications were stuck in the "rebalancing" state.
>  # The Kafka streams apps have exactly-once semantics enabled.
>  # The stack trace showed most of the stream threads sending the join group 
> requests to the group co-ordinator
>  # Few stream threads couldn't initiate the join group request since the call 
> to 
> [org.apache.kafka.clients.producer.KafkaProducer#sendOffsetsToTransaction|https://jira.corp.appdynamics.com/browse/ANLYTCS_ES-2062#sendOffsetsToTransaction%20which%20was%20hung]
>  was stuck.
>  # Seems like the join group requests were getting parked at the coordinator 
> since the expected members hadn't sent their own group join requests
>  # And after the timeout, the stream threads that were not stuck sent a new 
> join group requests.  
>  # Maybe (6) and (7) is happening infinitely
>  # Sample values of the GroupMetadata object on the group co-ordinator - 
> [^Screen Shot 2019-07-11 at 12.08.09 PM.png]
>  # The list of notYetJoinedMembers client id's matched with the threads 
> waiting for their offsets to be committed. 
> {code:java}
> [List(MemberMetadata(memberId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-38-consumer-efa41349-3da1-43b6-9710-a662f68c63b1,
>  
> clientId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-38-consumer,
>  clientHost=/10.136.98.48, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36-consumer-7cc8e41b-ad98-4006-a18a-b22abe6350f4,
>  
> clientId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36-consumer,
>  clientHost=/10.136.103.148, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-27-consumer-9ffb96c1-3379-4cbd-bee1-5d4719fe6c9d,
>  
> clientId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-27-consumer,
>  clientHost=/10.136.98.48, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21-consumer-5b8a1f1f-84dd-4a87-86c8-7542c0e50d1f,
>  
> clientId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21-consumer,
>  clientHost=/10.136.103.148, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33-consumer-3cb67ec9-c548-4386-962d-64d9772bf719,
>  
> clientId=metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33-consumer,
>  clientHost=/10.136.99.15, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ))]
> vabhinandan-mac:mp-jstack varsha.abhinandan$ cat jstack.* | grep 
> "metric-extractor-stream-c1-" | grep "StreamThread-" | grep "waiting on 
> condition"
> "metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36"
>  #128 daemon prio=5 os_prio=0 tid=0x7fc53c047800 nid=0xac waiting on 
> condition [0x7fc4e68e7000]
> "metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21"
>  #93 daemon prio=5 os_prio=0 tid=0x7fc53c2b5800 nid=0x9d waiting on 
> condition [0x7fc4e77f6000]
> "metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33"
>  #125 daemon prio=5 os_prio=0 

[jira] [Commented] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Chandrasekhar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892055#comment-16892055
 ] 

Chandrasekhar commented on KAFKA-8706:
--

Second Pass: Failed Test Cases: 3, per your list above:

UserQuotaTest. testQuotaOverrideDelete  - 
https://issues.apache.org/jira/browse/KAFKA-8032
UserQuotaTest. testThrottledProducerConsumer - 
https://issues.apache.org/jira/browse/KAFKA-8073
SocketServerTest. testControlPlaneRequest : Need to write a bug - will do in a 
few mins...

 

Please let me know if you know the reason for these intermittent failures 3,4,6 
...and let us know if this release is stable enough.

 

!KafkaUTFailures07242019_PASS2.GIF![^KafkaAUTFailures07242019_PASS2.txt]

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt, 
> KafkaAUTFailures07242019_PASS2.txt, KafkaUTFailures07242019.GIF, 
> KafkaUTFailures07242019_PASS2.GIF
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Chandrasekhar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar updated KAFKA-8706:
-
Attachment: KafkaUTFailures07242019_PASS2.GIF

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt, 
> KafkaAUTFailures07242019_PASS2.txt, KafkaUTFailures07242019.GIF, 
> KafkaUTFailures07242019_PASS2.GIF
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Chandrasekhar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar updated KAFKA-8706:
-
Attachment: KafkaAUTFailures07242019_PASS2.txt

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt, 
> KafkaAUTFailures07242019_PASS2.txt, KafkaUTFailures07242019.GIF, 
> KafkaUTFailures07242019_PASS2.GIF
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID

2019-07-24 Thread Bob Barrett (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892046#comment-16892046
 ] 

Bob Barrett commented on KAFKA-7190:


The periodic producer expiration check expires producers if 1) there is no 
ongoing transaction for the producer, and 2) the max timestamp of the last 
batch written by that producer is more than `transactional.id.expiration.ms` 
ago. The default value for `transactional.id.expiration.ms` is 7 days. If the 
input topic only has messages older than 7 days, and the transformed records 
produced to it also have a timestamp older than 7 days, then the producer will 
be expired the first time that the expiration check runs without an ongoing 
transaction. KIP-360 will indirectly address this problem by allowing the 
producer to continue after receiving an UNKNOWN_PRODUCER_ID error, but a better 
solution would probably be to set the producer state timestamp based on the 
current time, not the batch timestamp, as [~rocketraman] suggested. 
[~hachikuji] what do you think?

In the meantime, another workaround would be to set 
`transactional.id.expiration.ms` to a larger number, which would allow the 
transformed records to retain the default Streams timestamp behavior.

> Under low traffic conditions purging repartition topics cause WARN statements 
> about  UNKNOWN_PRODUCER_ID 
> -
>
> Key: KAFKA-7190
> URL: https://issues.apache.org/jira/browse/KAFKA-7190
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, streams
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Bill Bejeck
>Assignee: Guozhang Wang
>Priority: Major
>
> When a streams application has little traffic, then it is possible that 
> consumer purging would delete
> even the last message sent by a producer (i.e., all the messages sent by
> this producer have been consumed and committed), and as a result, the broker
> would delete that producer's ID. The next time when this producer tries to
> send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case,
> this error is retriable: the producer would just get a new producer id and
> retries, and then this time it will succeed. 
>  
> Possible fixes could be on the broker side, i.e., delaying the deletion of 
> the produderIDs for a more extended period or on the streams side developing 
> a more conservative approach to deleting offsets from repartition topics
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8673) Kafka stream threads stuck while sending offsets to transaction preventing join group from completing

2019-07-24 Thread Varsha Abhinandan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892014#comment-16892014
 ] 

Varsha Abhinandan commented on KAFKA-8673:
--

Hi [~guozhang], the threads were blocked on TransactionalRequestResult.await 
for about 4 days. The rebalance completed only after we restarted the processes 
which had the stream threads stuck on TransactionalRequestResult.await. 

 

 

> Kafka stream threads stuck while sending offsets to transaction preventing 
> join group from completing
> -
>
> Key: KAFKA-8673
> URL: https://issues.apache.org/jira/browse/KAFKA-8673
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, streams
>Affects Versions: 2.2.0
>Reporter: Varsha Abhinandan
>Priority: Major
> Attachments: Screen Shot 2019-07-11 at 12.08.09 PM.png
>
>
> We observed a deadlock kind of a situation in our Kafka streams application 
> when we accidentally shut down all the brokers. The Kafka cluster was brought 
> back in about an hour. 
> Observations made :
>  # Normal Kafka producers and consumers started working fine after the 
> brokers were up again. 
>  # The Kafka streams applications were stuck in the "rebalancing" state.
>  # The Kafka streams apps have exactly-once semantics enabled.
>  # The stack trace showed most of the stream threads sending the join group 
> requests to the group co-ordinator
>  # Few stream threads couldn't initiate the join group request since the call 
> to 
> [org.apache.kafka.clients.producer.KafkaProducer#sendOffsetsToTransaction|https://jira.corp.appdynamics.com/browse/ANLYTCS_ES-2062#sendOffsetsToTransaction%20which%20was%20hung]
>  was stuck.
>  # Seems like the join group requests were getting parked at the coordinator 
> since the expected members hadn't sent their own group join requests
>  # And after the timeout, the stream threads that were not stuck sent a new 
> join group requests.  
>  # Maybe (6) and (7) is happening infinitely
>  # Sample values of the GroupMetadata object on the group co-ordinator - 
> [^Screen Shot 2019-07-11 at 12.08.09 PM.png]
>  # The list of notYetJoinedMembers client id's matched with the threads 
> waiting for their offsets to be committed. 
> {code:java}
> [List(MemberMetadata(memberId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-38-consumer-efa41349-3da1-43b6-9710-a662f68c63b1,
>  
> clientId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-38-consumer,
>  clientHost=/10.136.98.48, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36-consumer-7cc8e41b-ad98-4006-a18a-b22abe6350f4,
>  
> clientId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36-consumer,
>  clientHost=/10.136.103.148, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-27-consumer-9ffb96c1-3379-4cbd-bee1-5d4719fe6c9d,
>  
> clientId=metric-extractor-stream-c1-d9ac8890-cd80-4b75-a85a-2ff39ea27961-StreamThread-27-consumer,
>  clientHost=/10.136.98.48, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21-consumer-5b8a1f1f-84dd-4a87-86c8-7542c0e50d1f,
>  
> clientId=metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21-consumer,
>  clientHost=/10.136.103.148, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ), 
> MemberMetadata(memberId=metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33-consumer-3cb67ec9-c548-4386-962d-64d9772bf719,
>  
> clientId=metric-extractor-stream-c1-994cee9b-b79b-483b-97cd-f89e8cbb015a-StreamThread-33-consumer,
>  clientHost=/10.136.99.15, sessionTimeoutMs=15000, 
> rebalanceTimeoutMs=2147483647, supportedProtocols=List(stream), ))]
> vabhinandan-mac:mp-jstack varsha.abhinandan$ cat jstack.* | grep 
> "metric-extractor-stream-c1-" | grep "StreamThread-" | grep "waiting on 
> condition"
> "metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-36"
>  #128 daemon prio=5 os_prio=0 tid=0x7fc53c047800 nid=0xac waiting on 
> condition [0x7fc4e68e7000]
> "metric-extractor-stream-c1-4875282b-1f26-47cd-affd-23ba5f26787a-StreamThread-21"
>  #93 daemon prio=5 os_prio=0 tid=0x7fc53c2b5800 nid=0x9d waiting on 
> condition [0x7fc4e77f6000]
> 

[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID

2019-07-24 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891992#comment-16891992
 ] 

Raman Gupta commented on KAFKA-7190:


> Did you override message.timestamp.difference.max.ms?

No.

> Under low traffic conditions purging repartition topics cause WARN statements 
> about  UNKNOWN_PRODUCER_ID 
> -
>
> Key: KAFKA-7190
> URL: https://issues.apache.org/jira/browse/KAFKA-7190
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, streams
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Bill Bejeck
>Assignee: Guozhang Wang
>Priority: Major
>
> When a streams application has little traffic, then it is possible that 
> consumer purging would delete
> even the last message sent by a producer (i.e., all the messages sent by
> this producer have been consumed and committed), and as a result, the broker
> would delete that producer's ID. The next time when this producer tries to
> send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case,
> this error is retriable: the producer would just get a new producer id and
> retries, and then this time it will succeed. 
>  
> Possible fixes could be on the broker side, i.e., delaying the deletion of 
> the produderIDs for a more extended period or on the streams side developing 
> a more conservative approach to deleting offsets from repartition topics
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8709) improve consumer offsets resiliency

2019-07-24 Thread Christian Becker (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Becker updated KAFKA-8709:

Description: 
We attempted to do an update from 2.2 to 2.3 and then a rollback was done after 
{{inter.broker.protocol}} was changed. (We know this shouldn't be done, but it 
happened).

After downgrading to 2.2 again, some {{__consumer-offsets}} partitions fail to 
load with the message {{Unknown group metadata version 3}}. Subsequently the 
broker continues it's startup and the consumer groups won't exist. So the 
consumers are starting at their configured OLDEST or NEWEST position and start 
committing their offsets.

However on subsequent restarts of the brokers, the {{Unknown group metadata 
version}} exception remains and so the restarts are happening over and over 
again.

 

In order to prevent this, I'd suggest a updated flow when loading the offsets:
 - the loading should continue reading the __consumer-offsets partition to see 
if a subsequent offset is commited that is readable
 - if no "valid" offset could be found, throw the existing exception to let the 
operator know about the situation
 - if a valid offset can be found, continue as normal

 

This would cause the following sequence of events:
 1. corrupted offsets are written
 2. broker restart
 2a. broker loads offset partition
 2b. {{KafkaException}} when loading the offset partition
 2c. no "valid" offset is found after the "corrupt" record
 2d. offsets reset
 3. consumergroups are recreated and "valid" offsets are appended
 4. broker restart
 4a. broker loads offset partition
 4b.  {{KafkaException}} when loading the offset partition
 4c. "valid" offset is found after the "corrupted" ones
 5. consumergroups still have their latest offset

It's a special case now, that this happened after some human error, but this 
also poses a danger for situations where the offsets might be corrupted for 
some unrelated reason. losing the offsets is a very serious situation and there 
should be safeguards against it, especially when there might be offsets 
recoverable. With this improvement, the offsets would be still lost once, but 
the broker is able to recover automatically over time and after compaction the 
corrupted records will be gone. (In our case this caused serious confusion as 
we've lost the offsets multiple times, as the error message {{Error loading 
offsets from}} implies, that the corrupted data is deleted and therefore the 
situation is recovered, whereas in reality this continues to be a issue until 
the corrupt data is gone once and for all which might take a long time.

In our case we seem to have evicted the broken records with temporarily setting 
the segment time to a very low value and deactivation of compaction
{code:java}
/opt/kafka/bin/kafka-topics.sh --alter --config segment.ms=90 --topic 
__consumer_offsets --zookeeper localhost:2181
/opt/kafka/bin/kafka-topics.sh --alter --config cleanup.policy=delete --topic 
__consumer_offsets --zookeeper localhost:2181
/opt/kafka/bin/kafka-topics.sh --alter --config retention.ms=180 --topic 
__consumer_offsets --zookeeper localhost:2181
< wait for the cleaner to clean up >
/opt/kafka/bin/kafka-topics.sh --alter --delete-config segment.ms --topic 
__consumer_offsets --zookeeper localhost:2181
/opt/kafka/bin/kafka-topics.sh --alter --config cleanup.policy=compact --topic 
__consumer_offsets --zookeeper localhost:2181
/opt/kafka/bin/kafka-topics.sh --alter --delete-config retention.ms --topic 
__consumer_offsets --zookeeper localhost:2181
< hope all consumer groups commited their offset before a failover needs to 
happen >{code}
 

  was:
We attempted to do an update from 2.2 to 2.3 and then a rollback was done after 
{{inter.broker.protocol}} was changed. (We know this shouldn't be done, but it 
happened).

After downgrading to 2.2 again, some {{__consumer-offsets}} partitions fail to 
load with the message {{Unknown group metadata version 3}}. Subsequently the 
broker continues it's startup and the consumer groups won't exist. So the 
consumers are starting at their configured OLDEST or NEWEST position and start 
committing their offsets.

However on subsequent restarts of the brokers, the {{Unknown group metadata 
version}} exception remains and so the restarts are happening over and over 
again.

 

In order to prevent this, I'd suggest a updated flow when loading the offsets:
- the loading should continue reading the __consumer-offsets partition to see 
if a subsequent offset is commited that is readable
- if no "valid" offset could be found, throw the existing exception to let the 
operator know about the situation
- if a valid offset can be found, continue as normal

 

This would cause the following sequence of events:
1. corrupted offsets are written
2. broker restart
2a. broker loads offset partition
2b. {{KafkaException}} when loading the offset partition
2c. no 

[jira] [Commented] (KAFKA-7940) Flaky Test CustomQuotaCallbackTest#testCustomQuotaCallback

2019-07-24 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891987#comment-16891987
 ] 

Matthias J. Sax commented on KAFKA-7940:


[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3809/tests]

> Flaky Test CustomQuotaCallbackTest#testCustomQuotaCallback
> --
>
> Key: KAFKA-7940
> URL: https://issues.apache.org/jira/browse/KAFKA-7940
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Stanislav Kozlovski
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.2.0
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/14/]
> {quote}java.lang.AssertionError: Too many quotaLimit calls Map(PRODUCE -> 1, 
> FETCH -> 1, REQUEST -> 4) at org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.CustomQuotaCallbackTest.testCustomQuotaCallback(CustomQuotaCallbackTest.scala:105){quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8122) Flaky Test EosIntegrationTest#shouldNotViolateEosIfOneTaskFailsWithState

2019-07-24 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8122:
---
Affects Version/s: 2.4.0

> Flaky Test EosIntegrationTest#shouldNotViolateEosIfOneTaskFailsWithState
> 
>
> Key: KAFKA-8122
> URL: https://issues.apache.org/jira/browse/KAFKA-8122
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: flaky-test
> Fix For: 2.4.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3285/testReport/junit/org.apache.kafka.streams.integration/EosIntegrationTest/shouldNotViolateEosIfOneTaskFailsWithState/]
> {quote}java.lang.AssertionError: Expected: <[KeyValue(0, 0), KeyValue(0, 1), 
> KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 10), KeyValue(0, 15), KeyValue(0, 
> 21), KeyValue(0, 28), KeyValue(0, 36), KeyValue(0, 45)]> but: was 
> <[KeyValue(0, 0), KeyValue(0, 1), KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 
> 10), KeyValue(0, 15), KeyValue(0, 21), KeyValue(0, 28), KeyValue(0, 36), 
> KeyValue(0, 45), KeyValue(0, 55), KeyValue(0, 66), KeyValue(0, 78), 
> KeyValue(0, 91), KeyValue(0, 105)]> at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6) at 
> org.apache.kafka.streams.integration.EosIntegrationTest.checkResultPerKey(EosIntegrationTest.java:212)
>  at 
> org.apache.kafka.streams.integration.EosIntegrationTest.shouldNotViolateEosIfOneTaskFailsWithState(EosIntegrationTest.java:414){quote}
> STDOUT
> {quote}[2019-03-17 01:19:51,971] INFO Created server with tickTime 800 
> minSessionTimeout 1600 maxSessionTimeout 16000 datadir 
> /tmp/kafka-10997967593034298484/version-2 snapdir 
> /tmp/kafka-5184295822696533708/version-2 
> (org.apache.zookeeper.server.ZooKeeperServer:174) [2019-03-17 01:19:51,971] 
> INFO binding to port /127.0.0.1:0 
> (org.apache.zookeeper.server.NIOServerCnxnFactory:89) [2019-03-17 
> 01:19:51,973] INFO KafkaConfig values: advertised.host.name = null 
> advertised.listeners = null advertised.port = null 
> alter.config.policy.class.name = null 
> alter.log.dirs.replication.quota.window.num = 11 
> alter.log.dirs.replication.quota.window.size.seconds = 1 
> authorizer.class.name = auto.create.topics.enable = false 
> auto.leader.rebalance.enable = true background.threads = 10 broker.id = 0 
> broker.id.generation.enable = true broker.rack = null 
> client.quota.callback.class = null compression.type = producer 
> connection.failed.authentication.delay.ms = 100 connections.max.idle.ms = 
> 60 connections.max.reauth.ms = 0 control.plane.listener.name = null 
> controlled.shutdown.enable = true controlled.shutdown.max.retries = 3 
> controlled.shutdown.retry.backoff.ms = 5000 controller.socket.timeout.ms = 
> 3 create.topic.policy.class.name = null default.replication.factor = 1 
> delegation.token.expiry.check.interval.ms = 360 
> delegation.token.expiry.time.ms = 8640 delegation.token.master.key = null 
> delegation.token.max.lifetime.ms = 60480 
> delete.records.purgatory.purge.interval.requests = 1 delete.topic.enable = 
> true fetch.purgatory.purge.interval.requests = 1000 
> group.initial.rebalance.delay.ms = 0 group.max.session.timeout.ms = 30 
> group.max.size = 2147483647 group.min.session.timeout.ms = 0 host.name = 
> localhost inter.broker.listener.name = null inter.broker.protocol.version = 
> 2.2-IV1 kafka.metrics.polling.interval.secs = 10 kafka.metrics.reporters = [] 
> leader.imbalance.check.interval.seconds = 300 
> leader.imbalance.per.broker.percentage = 10 listener.security.protocol.map = 
> PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL 
> listeners = null log.cleaner.backoff.ms = 15000 
> log.cleaner.dedupe.buffer.size = 2097152 log.cleaner.delete.retention.ms = 
> 8640 log.cleaner.enable = true log.cleaner.io.buffer.load.factor = 0.9 
> log.cleaner.io.buffer.size = 524288 log.cleaner.io.max.bytes.per.second = 
> 1.7976931348623157E308 log.cleaner.min.cleanable.ratio = 0.5 
> log.cleaner.min.compaction.lag.ms = 0 log.cleaner.threads = 1 
> log.cleanup.policy = [delete] log.dir = 
> /tmp/junit16020146621422955757/junit17406374597406011269 log.dirs = null 
> log.flush.interval.messages = 9223372036854775807 log.flush.interval.ms = 
> null log.flush.offset.checkpoint.interval.ms = 6 
> log.flush.scheduler.interval.ms = 9223372036854775807 
> log.flush.start.offset.checkpoint.interval.ms = 6 
> log.index.interval.bytes = 4096 log.index.size.max.bytes = 10485760 
> log.message.downconversion.enable = true log.message.format.version = 2.2-IV1 
> 

[jira] [Commented] (KAFKA-8122) Flaky Test EosIntegrationTest#shouldNotViolateEosIfOneTaskFailsWithState

2019-07-24 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891988#comment-16891988
 ] 

Matthias J. Sax commented on KAFKA-8122:


[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3809/tests]

> Flaky Test EosIntegrationTest#shouldNotViolateEosIfOneTaskFailsWithState
> 
>
> Key: KAFKA-8122
> URL: https://issues.apache.org/jira/browse/KAFKA-8122
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: flaky-test
> Fix For: 2.4.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3285/testReport/junit/org.apache.kafka.streams.integration/EosIntegrationTest/shouldNotViolateEosIfOneTaskFailsWithState/]
> {quote}java.lang.AssertionError: Expected: <[KeyValue(0, 0), KeyValue(0, 1), 
> KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 10), KeyValue(0, 15), KeyValue(0, 
> 21), KeyValue(0, 28), KeyValue(0, 36), KeyValue(0, 45)]> but: was 
> <[KeyValue(0, 0), KeyValue(0, 1), KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 
> 10), KeyValue(0, 15), KeyValue(0, 21), KeyValue(0, 28), KeyValue(0, 36), 
> KeyValue(0, 45), KeyValue(0, 55), KeyValue(0, 66), KeyValue(0, 78), 
> KeyValue(0, 91), KeyValue(0, 105)]> at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6) at 
> org.apache.kafka.streams.integration.EosIntegrationTest.checkResultPerKey(EosIntegrationTest.java:212)
>  at 
> org.apache.kafka.streams.integration.EosIntegrationTest.shouldNotViolateEosIfOneTaskFailsWithState(EosIntegrationTest.java:414){quote}
> STDOUT
> {quote}[2019-03-17 01:19:51,971] INFO Created server with tickTime 800 
> minSessionTimeout 1600 maxSessionTimeout 16000 datadir 
> /tmp/kafka-10997967593034298484/version-2 snapdir 
> /tmp/kafka-5184295822696533708/version-2 
> (org.apache.zookeeper.server.ZooKeeperServer:174) [2019-03-17 01:19:51,971] 
> INFO binding to port /127.0.0.1:0 
> (org.apache.zookeeper.server.NIOServerCnxnFactory:89) [2019-03-17 
> 01:19:51,973] INFO KafkaConfig values: advertised.host.name = null 
> advertised.listeners = null advertised.port = null 
> alter.config.policy.class.name = null 
> alter.log.dirs.replication.quota.window.num = 11 
> alter.log.dirs.replication.quota.window.size.seconds = 1 
> authorizer.class.name = auto.create.topics.enable = false 
> auto.leader.rebalance.enable = true background.threads = 10 broker.id = 0 
> broker.id.generation.enable = true broker.rack = null 
> client.quota.callback.class = null compression.type = producer 
> connection.failed.authentication.delay.ms = 100 connections.max.idle.ms = 
> 60 connections.max.reauth.ms = 0 control.plane.listener.name = null 
> controlled.shutdown.enable = true controlled.shutdown.max.retries = 3 
> controlled.shutdown.retry.backoff.ms = 5000 controller.socket.timeout.ms = 
> 3 create.topic.policy.class.name = null default.replication.factor = 1 
> delegation.token.expiry.check.interval.ms = 360 
> delegation.token.expiry.time.ms = 8640 delegation.token.master.key = null 
> delegation.token.max.lifetime.ms = 60480 
> delete.records.purgatory.purge.interval.requests = 1 delete.topic.enable = 
> true fetch.purgatory.purge.interval.requests = 1000 
> group.initial.rebalance.delay.ms = 0 group.max.session.timeout.ms = 30 
> group.max.size = 2147483647 group.min.session.timeout.ms = 0 host.name = 
> localhost inter.broker.listener.name = null inter.broker.protocol.version = 
> 2.2-IV1 kafka.metrics.polling.interval.secs = 10 kafka.metrics.reporters = [] 
> leader.imbalance.check.interval.seconds = 300 
> leader.imbalance.per.broker.percentage = 10 listener.security.protocol.map = 
> PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL 
> listeners = null log.cleaner.backoff.ms = 15000 
> log.cleaner.dedupe.buffer.size = 2097152 log.cleaner.delete.retention.ms = 
> 8640 log.cleaner.enable = true log.cleaner.io.buffer.load.factor = 0.9 
> log.cleaner.io.buffer.size = 524288 log.cleaner.io.max.bytes.per.second = 
> 1.7976931348623157E308 log.cleaner.min.cleanable.ratio = 0.5 
> log.cleaner.min.compaction.lag.ms = 0 log.cleaner.threads = 1 
> log.cleanup.policy = [delete] log.dir = 
> /tmp/junit16020146621422955757/junit17406374597406011269 log.dirs = null 
> log.flush.interval.messages = 9223372036854775807 log.flush.interval.ms = 
> null log.flush.offset.checkpoint.interval.ms = 6 
> log.flush.scheduler.interval.ms = 9223372036854775807 
> log.flush.start.offset.checkpoint.interval.ms = 6 
> log.index.interval.bytes = 4096 log.index.size.max.bytes 

[jira] [Commented] (KAFKA-8589) Flakey test ResetConsumerGroupOffsetTest#testResetOffsetsExistingTopic

2019-07-24 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891985#comment-16891985
 ] 

Matthias J. Sax commented on KAFKA-8589:


[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3809/tests]

> Flakey test ResetConsumerGroupOffsetTest#testResetOffsetsExistingTopic
> --
>
> Key: KAFKA-8589
> URL: https://issues.apache.org/jira/browse/KAFKA-8589
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, clients, unit tests
>Affects Versions: 2.4.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5724/consoleFull]
> *20:25:15* 
> kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsExistingTopic 
> failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsExistingTopic.test.stdout*20:25:15*
>  *20:25:15* kafka.admin.ResetConsumerGroupOffsetTest > 
> testResetOffsetsExistingTopic FAILED*20:25:15* 
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.*20:25:15* at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)*20:25:15*
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)*20:25:15*
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)*20:25:15*
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)*20:25:15*
>  at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$resetOffsets$1(ConsumerGroupCommand.scala:379)*20:25:15*
>  at 
> scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:160)*20:25:15*
>  at 
> scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:160)*20:25:15*
>  at scala.collection.Iterator.foreach(Iterator.scala:941)*20:25:15*   
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)*20:25:15* 
> at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1429)*20:25:15*  
>at scala.collection.IterableLike.foreach(IterableLike.scala:74)*20:25:15*  
>at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73)*20:25:15*   
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)*20:25:15*   
>   at 
> scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:160)*20:25:15*
>  at 
> scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:158)*20:25:15*
>  at 
> scala.collection.AbstractTraversable.foldLeft(Traversable.scala:108)*20:25:15*
>  at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService.resetOffsets(ConsumerGroupCommand.scala:377)*20:25:15*
>  at 
> kafka.admin.ResetConsumerGroupOffsetTest.resetOffsets(ResetConsumerGroupOffsetTest.scala:507)*20:25:15*
>  at 
> kafka.admin.ResetConsumerGroupOffsetTest.resetAndAssertOffsets(ResetConsumerGroupOffsetTest.scala:477)*20:25:15*
>  at 
> kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsExistingTopic(ResetConsumerGroupOffsetTest.scala:123)*20:25:15*
>  *20:25:15* Caused by:*20:25:15* 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.*20*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8671) NullPointerException occurs if topic associated with GlobalKTable changes

2019-07-24 Thread Alex Leung (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891984#comment-16891984
 ] 

Alex Leung commented on KAFKA-8671:
---

Thanks for taking a look, [~guozhang]. I also wanted to add a unit test for 
this but haven't had time yet. Will add that and submit a PR.

> NullPointerException occurs if topic associated with GlobalKTable changes
> -
>
> Key: KAFKA-8671
> URL: https://issues.apache.org/jira/browse/KAFKA-8671
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Alex Leung
>Assignee: Alex Leung
>Priority: Critical
>
> The following NullPointerException occurs when the global/.checkpoint file 
> contains a line with a topic previously associated with (but no longer 
> configured for) a GlobalKTable:
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.kafka.streams.processor.internals.GlobalStateUpdateTask.update(GlobalStateUpdateTask.java:85)
> at 
> org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.pollAndUpdate(GlobalStreamThread.java:241)
> at 
> org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:290){code}
>  
> After line 84 
> ([https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/GlobalStateUpdateTask.java#L84)]
>  `sourceNodeAndDeserializer` is null for the old, but still valid, topic. 
> This can be reproduced with the following sequence:
>  # create a GlobalKTable associated with topic, 'global-topic1'
>  # change the topic associated with the GlobalKTable to 'global-topic2' 
>  ##  at this point, the global/.checkpoint file will contain lines for both 
> topics
>  # produce messages to previous topic ('global-topic1')
>  # the consumer will attempt to consume from global-topic1, but no 
> deserializer associated with global-topic1 will be found and the NPE will 
> occur
> It looks like the following recent commit has included checkpoint validations 
> that may prevent this issue: 
> https://github.com/apache/kafka/commit/53b4ce5c00d61be87962f603682873665155cec4#diff-cc98a6c20f2a8483e1849aea6921c34dR425



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID

2019-07-24 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891976#comment-16891976
 ] 

Guozhang Wang commented on KAFKA-7190:
--

Hmm interesting, in this case the topic is not a repartition topic so no 
purge-records requests would be sent (note that in this case even if the 
messages are not yet physically removed, the producer id would still be 
deleted). Did you override {{message.timestamp.difference.max.ms}}?

cc [~bob-barrett] who's working on KIP-360 now, maybe he can chime in with some 
insights.

> Under low traffic conditions purging repartition topics cause WARN statements 
> about  UNKNOWN_PRODUCER_ID 
> -
>
> Key: KAFKA-7190
> URL: https://issues.apache.org/jira/browse/KAFKA-7190
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, streams
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Bill Bejeck
>Assignee: Guozhang Wang
>Priority: Major
>
> When a streams application has little traffic, then it is possible that 
> consumer purging would delete
> even the last message sent by a producer (i.e., all the messages sent by
> this producer have been consumed and committed), and as a result, the broker
> would delete that producer's ID. The next time when this producer tries to
> send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case,
> this error is retriable: the producer would just get a new producer id and
> retries, and then this time it will succeed. 
>  
> Possible fixes could be on the broker side, i.e., delaying the deletion of 
> the produderIDs for a more extended period or on the streams side developing 
> a more conservative approach to deleting offsets from repartition topics
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8710) InitProducerId changes for KIP-360

2019-07-24 Thread Bob Barrett (JIRA)
Bob Barrett created KAFKA-8710:
--

 Summary: InitProducerId changes for KIP-360
 Key: KAFKA-8710
 URL: https://issues.apache.org/jira/browse/KAFKA-8710
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 2.3.0
Reporter: Bob Barrett
Assignee: Bob Barrett


As part of KIP-360, InitProducerId needs to accept two additional parameters, 
the current producerId and the current producerEpoch, and it needs to allow 
producers to safely re-initialize a producer ID and continue processing as long 
as no other producer with the same transactional ID has started up.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8671) NullPointerException occurs if topic associated with GlobalKTable changes

2019-07-24 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891971#comment-16891971
 ] 

Guozhang Wang commented on KAFKA-8671:
--

I looked at the diff and it looks promising to me.

Could you file a PR following the contribution guidelines as well? 
https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes#ContributingCodeChanges-PullRequest

Also cc [~vvcephei] for reviewing this change.

> NullPointerException occurs if topic associated with GlobalKTable changes
> -
>
> Key: KAFKA-8671
> URL: https://issues.apache.org/jira/browse/KAFKA-8671
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Alex Leung
>Assignee: Alex Leung
>Priority: Critical
>
> The following NullPointerException occurs when the global/.checkpoint file 
> contains a line with a topic previously associated with (but no longer 
> configured for) a GlobalKTable:
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.kafka.streams.processor.internals.GlobalStateUpdateTask.update(GlobalStateUpdateTask.java:85)
> at 
> org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.pollAndUpdate(GlobalStreamThread.java:241)
> at 
> org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:290){code}
>  
> After line 84 
> ([https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/GlobalStateUpdateTask.java#L84)]
>  `sourceNodeAndDeserializer` is null for the old, but still valid, topic. 
> This can be reproduced with the following sequence:
>  # create a GlobalKTable associated with topic, 'global-topic1'
>  # change the topic associated with the GlobalKTable to 'global-topic2' 
>  ##  at this point, the global/.checkpoint file will contain lines for both 
> topics
>  # produce messages to previous topic ('global-topic1')
>  # the consumer will attempt to consume from global-topic1, but no 
> deserializer associated with global-topic1 will be found and the NPE will 
> occur
> It looks like the following recent commit has included checkpoint validations 
> that may prevent this issue: 
> https://github.com/apache/kafka/commit/53b4ce5c00d61be87962f603682873665155cec4#diff-cc98a6c20f2a8483e1849aea6921c34dR425



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8396) Clean up Transformer API

2019-07-24 Thread John Roesler (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891913#comment-16891913
 ] 

John Roesler commented on KAFKA-8396:
-

>From the discussion of KIP-478, Guozhang had the following comments relevant 
>to this ticket:

{quote}Hi John,

Just a wild thought about Transformer: now with the new Processor#init(ProcessorContext), do we still need a
Transformer (and even ValueTransformer / ValueTransformerWithKey)?

What if:

* We just make KStream#transform to get a ProcessorSupplier as well, and
inside `process()` we check that at most one `context.forward()` is called,
and then take it as the return value.
* We would still use ValueTransformer for KStream#transformValue, or we can
also use a `ProcessorSupplier where we allow at most one
`context.forward()` AND we ignore whatever passed in as key but just use
the original key.


Guozhang{quote}

> Clean up Transformer API
> 
>
> Key: KAFKA-8396
> URL: https://issues.apache.org/jira/browse/KAFKA-8396
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Priority: Major
>  Labels: needs-kip, user-experience
>
> Currently, KStream operators transformValues and flatTransformValues disable 
> context forwarding, and force operators to just return the new values.
> The reason is that we wanted to prevent the key from changing, since the 
> whole point of a `xValues` transformation is that we _do not_ change the key, 
> and hence don't need to repartition.
> However, the chosen mechanism has some drawbacks: The Transform concept is 
> basically a way to plug in a custom Processor within the Streams DSL, but 
> these restrictions make it more like a MapValues with access to the context. 
> For example, even though you can still schedule punctuations, there's no way 
> to forward values as a result of them. So, as a user, it's hard to build a 
> mental model of how to use a TransformValues (because it's not quite a 
> Transformer and not quite a Mapper).
> Also, logically, a Transformer can call forward as much as it wants, so a 
> Transformer and a FlatTransformer are effectively the same thing. Then, we 
> also have TransformValues and FlatTransformValues that are also two more 
> versions of the same thing, just to implement the key restrictions. 
> Internally, some of these can send downstream by returning OR forwarding, and 
> others can only return. It's a lot for users to keep in mind.
> We can clean up this API significantly by just allowing all transformers to 
> call `forward`. In the `Values` case, we can wrap the ProcessorContext in one 
> that checks the key is `equal` to the one that got passed in (i.e., saves a 
> reference and enforces equality with that reference in any call to 
> `forward`). Then, we can actually deprecate the `*ValueTransformer*` 
> interfaces and remove the restriction about calling forward.
> We can consider a further cleanup (TBD) to deprecate the existing Transformer 
> interface entirely, and replace it with one with a `void` return type. Then, 
> the Transform and FlatTransform cases collapse together, and we just need 
> Transform and TransformValues.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Chandrasekhar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891890#comment-16891890
 ] 

Chandrasekhar commented on KAFKA-8706:
--

Re-running just one more time to confirm, if we hit these 3 times...I will have 
to change priority to critical to make sure there is no impending danger to 
actual running services...Kindly suggest on severity and let me know if you 
need any more information from this effort...we are just trying to get to 
latest so we are not left behind.

 

Appreciate swift response again!!

 

Best regards

 

Chandra Rao

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt, 
> KafkaUTFailures07242019.GIF
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8705) NullPointerException was thrown by topology optimization when two MergeNodes have common KeyChaingingNode

2019-07-24 Thread Bill Bejeck (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891887#comment-16891887
 ] 

Bill Bejeck edited comment on KAFKA-8705 at 7/24/19 2:12 PM:
-

[~hiro116s] thanks for the detailed explanation and the comprehensive analysis! 
 I'll take a look into fixing this.  Thanks again for taking the time to report 
this issue.

 

-Bill


was (Author: bbejeck):
[~hiro116s] thanks for reporting this, we'll take a look into this.

> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode
> -
>
> Key: KAFKA-8705
> URL: https://issues.apache.org/jira/browse/KAFKA-8705
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Hiroshi Nakahara
>Assignee: Bill Bejeck
>Priority: Major
>
> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode.
> Kafka Stream version: 2.3.0
> h3. Code
> {code:java}
> import org.apache.kafka.common.serialization.Serdes;
> import org.apache.kafka.streams.StreamsBuilder;
> import org.apache.kafka.streams.StreamsConfig;
> import org.apache.kafka.streams.kstream.Consumed;
> import org.apache.kafka.streams.kstream.KStream;
> import java.util.Properties;
> public class Main {
> public static void main(String[] args) {
> final StreamsBuilder streamsBuilder = new StreamsBuilder();
> final KStream parentStream = 
> streamsBuilder.stream("parentTopic", Consumed.with(Serdes.Integer(), 
> Serdes.Integer()))
> .selectKey(Integer::sum);  // To make parentStream 
> KeyChaingingPoint
> final KStream childStream1 = 
> parentStream.mapValues(v -> v + 1);
> final KStream childStream2 = 
> parentStream.mapValues(v -> v + 2);
> final KStream childStream3 = 
> parentStream.mapValues(v -> v + 3);
> childStream1
> .merge(childStream2)
> .merge(childStream3)
> .to("outputTopic");
> final Properties properties = new Properties();
> properties.setProperty(StreamsConfig.TOPOLOGY_OPTIMIZATION, 
> StreamsConfig.OPTIMIZE);
> streamsBuilder.build(properties);
> }
> }
> {code}
> h3. Expected result
> streamsBuilder.build should create Topology without throwing Exception.  The 
> expected topology is:
> {code:java}
> Topologies:
>Sub-topology: 0
> Source: KSTREAM-SOURCE-00 (topics: [parentTopic])
>   --> KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-KEY-SELECT-01 (stores: [])
>   --> KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03, 
> KSTREAM-MAPVALUES-04
>   <-- KSTREAM-SOURCE-00
> Processor: KSTREAM-MAPVALUES-02 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-03 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-04 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MERGE-05 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03
> Processor: KSTREAM-MERGE-06 (stores: [])
>   --> KSTREAM-SINK-07
>   <-- KSTREAM-MERGE-05, KSTREAM-MAPVALUES-04
> Sink: KSTREAM-SINK-07 (topic: outputTopic)
>   <-- KSTREAM-MERGE-06
> {code}
> h3. Actual result
> NullPointerException was thrown with the following stacktrace.
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
>   at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeUpdateKeyChangingRepartitionNodeMap(InternalStreamsBuilder.java:397)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeOptimizeRepartitionOperations(InternalStreamsBuilder.java:315)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybePerformOptimizations(InternalStreamsBuilder.java:304)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.buildAndOptimizeTopology(InternalStreamsBuilder.java:275)
>   at 
> org.apache.kafka.streams.StreamsBuilder.build(StreamsBuilder.java:558)
>   at Main.main(Main.java:24){code}
> h3. Cause
> This exception occurs in 
> InternalStreamsBuilder#maybeUpdateKeyChaingingRepartitionNodeMap.
> {code:java}
> private void maybeUpdateKeyChangingRepartitionNodeMap() {
> final Map> 
> 

[jira] [Commented] (KAFKA-8705) NullPointerException was thrown by topology optimization when two MergeNodes have common KeyChaingingNode

2019-07-24 Thread Bill Bejeck (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891887#comment-16891887
 ] 

Bill Bejeck commented on KAFKA-8705:


[~hiro116s] thanks for reporting this, we'll take a look into this.

> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode
> -
>
> Key: KAFKA-8705
> URL: https://issues.apache.org/jira/browse/KAFKA-8705
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Hiroshi Nakahara
>Assignee: Bill Bejeck
>Priority: Major
>
> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode.
> Kafka Stream version: 2.3.0
> h3. Code
> {code:java}
> import org.apache.kafka.common.serialization.Serdes;
> import org.apache.kafka.streams.StreamsBuilder;
> import org.apache.kafka.streams.StreamsConfig;
> import org.apache.kafka.streams.kstream.Consumed;
> import org.apache.kafka.streams.kstream.KStream;
> import java.util.Properties;
> public class Main {
> public static void main(String[] args) {
> final StreamsBuilder streamsBuilder = new StreamsBuilder();
> final KStream parentStream = 
> streamsBuilder.stream("parentTopic", Consumed.with(Serdes.Integer(), 
> Serdes.Integer()))
> .selectKey(Integer::sum);  // To make parentStream 
> KeyChaingingPoint
> final KStream childStream1 = 
> parentStream.mapValues(v -> v + 1);
> final KStream childStream2 = 
> parentStream.mapValues(v -> v + 2);
> final KStream childStream3 = 
> parentStream.mapValues(v -> v + 3);
> childStream1
> .merge(childStream2)
> .merge(childStream3)
> .to("outputTopic");
> final Properties properties = new Properties();
> properties.setProperty(StreamsConfig.TOPOLOGY_OPTIMIZATION, 
> StreamsConfig.OPTIMIZE);
> streamsBuilder.build(properties);
> }
> }
> {code}
> h3. Expected result
> streamsBuilder.build should create Topology without throwing Exception.  The 
> expected topology is:
> {code:java}
> Topologies:
>Sub-topology: 0
> Source: KSTREAM-SOURCE-00 (topics: [parentTopic])
>   --> KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-KEY-SELECT-01 (stores: [])
>   --> KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03, 
> KSTREAM-MAPVALUES-04
>   <-- KSTREAM-SOURCE-00
> Processor: KSTREAM-MAPVALUES-02 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-03 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-04 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MERGE-05 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03
> Processor: KSTREAM-MERGE-06 (stores: [])
>   --> KSTREAM-SINK-07
>   <-- KSTREAM-MERGE-05, KSTREAM-MAPVALUES-04
> Sink: KSTREAM-SINK-07 (topic: outputTopic)
>   <-- KSTREAM-MERGE-06
> {code}
> h3. Actual result
> NullPointerException was thrown with the following stacktrace.
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
>   at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeUpdateKeyChangingRepartitionNodeMap(InternalStreamsBuilder.java:397)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeOptimizeRepartitionOperations(InternalStreamsBuilder.java:315)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybePerformOptimizations(InternalStreamsBuilder.java:304)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.buildAndOptimizeTopology(InternalStreamsBuilder.java:275)
>   at 
> org.apache.kafka.streams.StreamsBuilder.build(StreamsBuilder.java:558)
>   at Main.main(Main.java:24){code}
> h3. Cause
> This exception occurs in 
> InternalStreamsBuilder#maybeUpdateKeyChaingingRepartitionNodeMap.
> {code:java}
> private void maybeUpdateKeyChangingRepartitionNodeMap() {
> final Map> 
> mergeNodesToKeyChangers = new HashMap<>();
> for (final StreamsGraphNode mergeNode : mergeNodes) {
> mergeNodesToKeyChangers.put(mergeNode, new LinkedHashSet<>());
> final Collection keys = 
> 

[jira] [Comment Edited] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Chandrasekhar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891872#comment-16891872
 ] 

Chandrasekhar edited comment on KAFKA-8706 at 7/24/19 2:08 PM:
---

[~ckamal]: Thanks for swift response and references to known issues!

 

Reran the AUT again, and found four failures. The following four are 
consistently failing. NOTE: Initially I had the build run of file descriptors, 
so I bumped up the open file descriptors to get around that issue before first 
run.

Do these failures mean anything to actually running software? i.e. the failures:

testProduceConsumeViaAssign

testThrottledProducerConsumer

testMetricsDuringTopicCreateDelete

testControlPlaneRequest

 

[^KafkaAUTFailures07242019.txt]!KafkaUTFailures07242019.GIF!


was (Author: chandranc@oracle.com):
Reran the AUT again, and found four failures. The following four are 
consistently failing. NOTE: Initially I had the build run of file descriptors, 
so I bumped up the open file descriptors to get around that issue before first 
run.

Do these failures mean anything to actually running software? i.e. the failures:

testProduceConsumeViaAssign

testThrottledProducerConsumer

testMetricsDuringTopicCreateDelete

testControlPlaneRequest

 

[^KafkaAUTFailures07242019.txt]!KafkaUTFailures07242019.GIF!

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt, 
> KafkaUTFailures07242019.GIF
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (KAFKA-8705) NullPointerException was thrown by topology optimization when two MergeNodes have common KeyChaingingNode

2019-07-24 Thread Bill Bejeck (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck reassigned KAFKA-8705:
--

Assignee: Bill Bejeck

> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode
> -
>
> Key: KAFKA-8705
> URL: https://issues.apache.org/jira/browse/KAFKA-8705
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Hiroshi Nakahara
>Assignee: Bill Bejeck
>Priority: Major
>
> NullPointerException was thrown by topology optimization when two MergeNodes 
> have common KeyChaingingNode.
> Kafka Stream version: 2.3.0
> h3. Code
> {code:java}
> import org.apache.kafka.common.serialization.Serdes;
> import org.apache.kafka.streams.StreamsBuilder;
> import org.apache.kafka.streams.StreamsConfig;
> import org.apache.kafka.streams.kstream.Consumed;
> import org.apache.kafka.streams.kstream.KStream;
> import java.util.Properties;
> public class Main {
> public static void main(String[] args) {
> final StreamsBuilder streamsBuilder = new StreamsBuilder();
> final KStream parentStream = 
> streamsBuilder.stream("parentTopic", Consumed.with(Serdes.Integer(), 
> Serdes.Integer()))
> .selectKey(Integer::sum);  // To make parentStream 
> KeyChaingingPoint
> final KStream childStream1 = 
> parentStream.mapValues(v -> v + 1);
> final KStream childStream2 = 
> parentStream.mapValues(v -> v + 2);
> final KStream childStream3 = 
> parentStream.mapValues(v -> v + 3);
> childStream1
> .merge(childStream2)
> .merge(childStream3)
> .to("outputTopic");
> final Properties properties = new Properties();
> properties.setProperty(StreamsConfig.TOPOLOGY_OPTIMIZATION, 
> StreamsConfig.OPTIMIZE);
> streamsBuilder.build(properties);
> }
> }
> {code}
> h3. Expected result
> streamsBuilder.build should create Topology without throwing Exception.  The 
> expected topology is:
> {code:java}
> Topologies:
>Sub-topology: 0
> Source: KSTREAM-SOURCE-00 (topics: [parentTopic])
>   --> KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-KEY-SELECT-01 (stores: [])
>   --> KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03, 
> KSTREAM-MAPVALUES-04
>   <-- KSTREAM-SOURCE-00
> Processor: KSTREAM-MAPVALUES-02 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-03 (stores: [])
>   --> KSTREAM-MERGE-05
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MAPVALUES-04 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-KEY-SELECT-01
> Processor: KSTREAM-MERGE-05 (stores: [])
>   --> KSTREAM-MERGE-06
>   <-- KSTREAM-MAPVALUES-02, KSTREAM-MAPVALUES-03
> Processor: KSTREAM-MERGE-06 (stores: [])
>   --> KSTREAM-SINK-07
>   <-- KSTREAM-MERGE-05, KSTREAM-MAPVALUES-04
> Sink: KSTREAM-SINK-07 (topic: outputTopic)
>   <-- KSTREAM-MERGE-06
> {code}
> h3. Actual result
> NullPointerException was thrown with the following stacktrace.
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
>   at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeUpdateKeyChangingRepartitionNodeMap(InternalStreamsBuilder.java:397)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybeOptimizeRepartitionOperations(InternalStreamsBuilder.java:315)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.maybePerformOptimizations(InternalStreamsBuilder.java:304)
>   at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.buildAndOptimizeTopology(InternalStreamsBuilder.java:275)
>   at 
> org.apache.kafka.streams.StreamsBuilder.build(StreamsBuilder.java:558)
>   at Main.main(Main.java:24){code}
> h3. Cause
> This exception occurs in 
> InternalStreamsBuilder#maybeUpdateKeyChaingingRepartitionNodeMap.
> {code:java}
> private void maybeUpdateKeyChangingRepartitionNodeMap() {
> final Map> 
> mergeNodesToKeyChangers = new HashMap<>();
> for (final StreamsGraphNode mergeNode : mergeNodes) {
> mergeNodesToKeyChangers.put(mergeNode, new LinkedHashSet<>());
> final Collection keys = 
> keyChangingOperationsToOptimizableRepartitionNodes.keySet();
> for (final StreamsGraphNode key : keys) 

[jira] [Commented] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Chandrasekhar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891872#comment-16891872
 ] 

Chandrasekhar commented on KAFKA-8706:
--

Reran the AUT again, and found four failures. The following four are 
consistently failing. NOTE: Initially I had the build run of file descriptors, 
so I bumped up the open file descriptors to get around that issue before first 
run.

Do these failures mean anything to actually running software? i.e. the failures:

testProduceConsumeViaAssign

testThrottledProducerConsumer

testMetricsDuringTopicCreateDelete

testControlPlaneRequest

 

[^KafkaAUTFailures07242019.txt]!KafkaUTFailures07242019.GIF!

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt, 
> KafkaUTFailures07242019.GIF
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Chandrasekhar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar updated KAFKA-8706:
-
Attachment: KafkaUTFailures07242019.GIF

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt, 
> KafkaUTFailures07242019.GIF
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Chandrasekhar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar updated KAFKA-8706:
-
Attachment: KafkaAUTFailures07242019.txt

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt, KafkaAUTFailures07242019.txt
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8709) hard fail on "Unknown group metadata version"

2019-07-24 Thread Christian Becker (JIRA)
Christian Becker created KAFKA-8709:
---

 Summary: hard fail on "Unknown group metadata version"
 Key: KAFKA-8709
 URL: https://issues.apache.org/jira/browse/KAFKA-8709
 Project: Kafka
  Issue Type: Improvement
Reporter: Christian Becker


We attempted to do an update from 2.2 to 2.3 and then a rollback was done after 
{{inter.broker.protocol}} was changed. (We know this shouldn't be done, but it 
happened).

After downgrading to 2.2 again, some {{__consumer-offsets}} partitions fail to 
load with the message {{Unknown group metadata version 3}}. Subsequently the 
broker continues it's startup and the consumer groups won't exist. So the 
consumers are starting at their configured OLDEST or NEWEST position and start 
committing their offsets.

However on subsequent restarts of the brokers, the {{Unknown group metadata 
version}} exception remains and so the restarts are happening over and over 
again.

 

In order to prevent this, I'd suggest a updated flow when loading the offsets:
- the loading should continue reading the __consumer-offsets partition to see 
if a subsequent offset is commited that is readable
- if no "valid" offset could be found, throw the existing exception to let the 
operator know about the situation
- if a valid offset can be found, continue as normal

 

This would cause the following sequence of events:
1. corrupted offsets are written
2. broker restart
2a. broker loads offset partition
2b. {{KafkaException}} when loading the offset partition
2c. no "valid" offset is found after the "corrupt" record
2d. offsets reset
3. consumergroups are recreated and "valid" offsets are appended
4. broker restart
4a. broker loads offset partition
4b.  {{KafkaException}} when loading the offset partition
4c. "valid" offset is found after the "corrupted" ones
5. consumergroups still have their latest offset

It's a special case now, that this happened after some human error, but this 
also poses a danger for situations where the offsets might be corrupted for 
some unrelated reason. losing the offsets is a very serious situation and there 
should be safeguards against it, especially when there might be offsets 
recoverable. With this improvement, the offsets would be still lost once, but 
the broker is able to recover automatically over time and after compaction the 
corrupted records will be gone. (In our case this caused serious confusion as 
we've lost the offsets multiple times, as the error message {{Error loading 
offsets from}} implies, that the corrupted data is deleted and therefore the 
situation is recovered, whereas in reality this continues to be a issue until 
the corrupt data is gone once and for all which might take a long time.

In our case we seem to have evicted the broken records with temporarily setting 
the segment time to a very low value and deactivation of compaction
{code:java}
/opt/kafka/bin/kafka-topics.sh --alter --config segment.ms=90 --topic 
__consumer_offsets --zookeeper localhost:2181
/opt/kafka/bin/kafka-topics.sh --alter --config cleanup.policy=delete --topic 
__consumer_offsets --zookeeper localhost:2181
< wait for the cleaner to clean up >
/opt/kafka/bin/kafka-topics.sh --alter --config segment.ms=60480 --topic 
__consumer_offsets --zookeeper localhost:2181
/opt/kafka/bin/kafka-topics.sh --alter --config cleanup.policy=compact --topic 
__consumer_offsets --zookeeper localhost:2181{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8709) improve consumer offsets resiliency

2019-07-24 Thread Christian Becker (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Becker updated KAFKA-8709:

Summary: improve consumer offsets resiliency  (was: hard fail on "Unknown 
group metadata version")

> improve consumer offsets resiliency
> ---
>
> Key: KAFKA-8709
> URL: https://issues.apache.org/jira/browse/KAFKA-8709
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Christian Becker
>Priority: Major
>
> We attempted to do an update from 2.2 to 2.3 and then a rollback was done 
> after {{inter.broker.protocol}} was changed. (We know this shouldn't be done, 
> but it happened).
> After downgrading to 2.2 again, some {{__consumer-offsets}} partitions fail 
> to load with the message {{Unknown group metadata version 3}}. Subsequently 
> the broker continues it's startup and the consumer groups won't exist. So the 
> consumers are starting at their configured OLDEST or NEWEST position and 
> start committing their offsets.
> However on subsequent restarts of the brokers, the {{Unknown group metadata 
> version}} exception remains and so the restarts are happening over and over 
> again.
>  
> In order to prevent this, I'd suggest a updated flow when loading the offsets:
> - the loading should continue reading the __consumer-offsets partition to see 
> if a subsequent offset is commited that is readable
> - if no "valid" offset could be found, throw the existing exception to let 
> the operator know about the situation
> - if a valid offset can be found, continue as normal
>  
> This would cause the following sequence of events:
> 1. corrupted offsets are written
> 2. broker restart
> 2a. broker loads offset partition
> 2b. {{KafkaException}} when loading the offset partition
> 2c. no "valid" offset is found after the "corrupt" record
> 2d. offsets reset
> 3. consumergroups are recreated and "valid" offsets are appended
> 4. broker restart
> 4a. broker loads offset partition
> 4b.  {{KafkaException}} when loading the offset partition
> 4c. "valid" offset is found after the "corrupted" ones
> 5. consumergroups still have their latest offset
> It's a special case now, that this happened after some human error, but this 
> also poses a danger for situations where the offsets might be corrupted for 
> some unrelated reason. losing the offsets is a very serious situation and 
> there should be safeguards against it, especially when there might be offsets 
> recoverable. With this improvement, the offsets would be still lost once, but 
> the broker is able to recover automatically over time and after compaction 
> the corrupted records will be gone. (In our case this caused serious 
> confusion as we've lost the offsets multiple times, as the error message 
> {{Error loading offsets from}} implies, that the corrupted data is deleted 
> and therefore the situation is recovered, whereas in reality this continues 
> to be a issue until the corrupt data is gone once and for all which might 
> take a long time.
> In our case we seem to have evicted the broken records with temporarily 
> setting the segment time to a very low value and deactivation of compaction
> {code:java}
> /opt/kafka/bin/kafka-topics.sh --alter --config segment.ms=90 --topic 
> __consumer_offsets --zookeeper localhost:2181
> /opt/kafka/bin/kafka-topics.sh --alter --config cleanup.policy=delete --topic 
> __consumer_offsets --zookeeper localhost:2181
> < wait for the cleaner to clean up >
> /opt/kafka/bin/kafka-topics.sh --alter --config segment.ms=60480 --topic 
> __consumer_offsets --zookeeper localhost:2181
> /opt/kafka/bin/kafka-topics.sh --alter --config cleanup.policy=compact 
> --topic __consumer_offsets --zookeeper localhost:2181{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (KAFKA-7213) NullPointerException during state restoration in kafka streams

2019-07-24 Thread Abhishek Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Agarwal reassigned KAFKA-7213:
---

Assignee: (was: Abhishek Agarwal)

> NullPointerException during state restoration in kafka streams
> --
>
> Key: KAFKA-7213
> URL: https://issues.apache.org/jira/browse/KAFKA-7213
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Abhishek Agarwal
>Priority: Major
>
> I had written a custom state store which has a batch restoration callback 
> registered. What I have observed, when multiple consumer instances are 
> restarted, the application keeps failing with NullPointerException. The stack 
> trace is 
> {noformat}
> java.lang.NullPointerException: null
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.putAll(RocksDBStore.java:351)
>  ~[kafka-streams-1.0.0.jar:?]
>   at 
> org.apache.kafka.streams.state.internals.RocksDBSlotKeyValueBytesStore.putAll(RocksDBSlotKeyValueBytesStore.java:100)
>  ~[streams-core-1.0.0.297.jar:?]
>   at 
> org.apache.kafka.streams.state.internals.RocksDBSlotKeyValueBytesStore$SlotKeyValueBatchRestoreCallback.restoreAll(RocksDBSlotKeyValueBytesStore.java:303)
>  ~[streams-core-1.0.0.297.jar:?]
>   at 
> org.apache.kafka.streams.processor.internals.CompositeRestoreListener.restoreAll(CompositeRestoreListener.java:89)
>  ~[kafka-streams-1.0.0.jar:?]
>   at 
> org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:75)
>  ~[kafka-streams-1.0.0.jar:?]
>   at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:277)
>  ~[kafka-streams-1.0.0.jar:?]
>   at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restorePartition(StoreChangelogReader.java:238)
>  ~[kafka-streams-1.0.0.jar:?]
>   at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:83)
>  ~[kafka-streams-1.0.0.jar:?]
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:263)
>  ~[kafka-streams-1.0.0.jar:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:803)
>  ~[kafka-streams-1.0.0.jar:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
>  ~[kafka-streams-1.0.0.jar:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
>  ~[kafka-streams-1.0.0.jar:?]
> {noformat}
> The faulty line in question is 
> {noformat}
> db.write(wOptions, batch);
> {noformat}
> in RocksDBStore.java which would mean that db variable is null. Probably the 
> store has been closed and restoration is still being done on it. After going 
> through the code, I think the problem is when state transitions from 
> PARTITIONS_ASSIGNED to PARTITIONS_REVOKED and restoration is still in 
> progress. 
> In such state transition, while the active tasks themselves are closed, the 
> changelog reader is not reset. It tries to restore the tasks that have 
> already been closed, db is null and results in NPE. 
> I will put in a fix to see if that fixes the issue. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8608) Broker shows WARN on reassignment partitions on new brokers: Replica LEO, follower position & Cache truncation

2019-07-24 Thread Di Campo (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891715#comment-16891715
 ] 

Di Campo commented on KAFKA-8608:
-

Hi Lillian,

I'm afraid I can't, If it happens again, which context do you need? Full broker 
logs for that amount of time? 

> Broker shows WARN on reassignment partitions on new brokers: Replica LEO, 
> follower position & Cache truncation
> --
>
> Key: KAFKA-8608
> URL: https://issues.apache.org/jira/browse/KAFKA-8608
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.1.1
> Environment: Kafka 2.1.1
>Reporter: Di Campo
>Priority: Minor
>  Labels: broker, reassign, repartition
>
> I added two brokers (brokerId 4,5) to a 3-node (brokerId 1,2,3) cluster where 
> there were 32 topics and 64 partitions on each, replication 3.
> Running reassigning partitions. 
> On each run, I can see the following WARN messages, but when the reassignment 
> partition process finishes, it all seems OK. ISR is OK (count is 3 in every 
> partition).
> But I get the following messages types, one per partition:
>  
> {code:java}
> [2019-06-27 12:42:03,946] WARN [LeaderEpochCache visitors-0.0.1-10] New epoch 
> entry EpochEntry(epoch=24, startOffset=51540) caused truncation of 
> conflicting entries ListBuffer(EpochEntry(epoch=22, startOffset=51540)). 
> Cache now contains 5 entries. (kafka.server.epoch.LeaderEpochFileCache) {code}
> -> This relates to cache, so I suppose it's pretty safe.
> {code:java}
> [2019-06-27 12:42:04,250] WARN [ReplicaManager broker=1] Leader 1 failed to 
> record follower 3's position 47981 since the replica is not recognized to be 
> one of the assigned replicas 1,2,5 for partition visitors-0.0.1-28. Empty 
> records will be returned for this partition. 
> (kafka.server.ReplicaManager){code}
> -> This is scary. I'm not sure about the severity of this, but it looks like 
> it may be missing records? 
> {code:java}
> [2019-06-27 12:42:03,709] WARN [ReplicaManager broker=1] While recording the 
> replica LEO, the partition visitors-0.0.1-58 hasn't been created. 
> (kafka.server.ReplicaManager){code}
> -> Here, these partitions are created. 
> First of all - am I supposed to be missing data here? I am assuming I don't, 
> and so far I don't see traces of losing anything.
> If so, I'm not sure what these messages are trying to say here. Should they 
> really be at WARN level? If so - should the message clarify better the 
> different risks involved? 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8612) Broker removes consumers from CG, Streams app gets stuck

2019-07-24 Thread Di Campo (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891712#comment-16891712
 ] 

Di Campo commented on KAFKA-8612:
-

I've upgraded to 2.3 in testing environment and for a week it hasn't happened. 
Although this was not something that happened often either, I'd say once per 2 
weeks maybe. Will update with results.

> Broker removes consumers from CG, Streams app gets stuck
> 
>
> Key: KAFKA-8612
> URL: https://issues.apache.org/jira/browse/KAFKA-8612
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, streams
>Affects Versions: 2.1.1
>Reporter: Di Campo
>Priority: Major
>  Labels: broker, streams, timeout
> Attachments: full-thread-dump-kafka-streams-stuck.log
>
>
> Cluster of 5 brokers, `Kafka 2.1.1`. m5.large (2 CPU, 8GB RAM) instances. 
> Kafka Streams application (`stream-processor`) cluster of 3 instances, 2 
> threads each. `2.1.0` 
> Consumer Store consumer group (ClickHouse Kafka Engine from `ClickHouse 
> 19.5.3.8`), with several tables consuming from a different topic each.
> The `stream-processor` is running consuming from a source topic and running a 
> topology of 26 topics (64 partitions each) with 5 state stores, 1 of them 
> sessioned, 4 key-value.
> Infra running on docker on AWS ECS. 
> Consuming at a rate of 300-1000 events per second. Each event generates an 
> avg of ~20 extra messages.
> Application has uncaughtExceptionHandler set.
> Timestamps are kept for better analysis.
> `stream-processor` tasks at some point fail to produce to any partition due 
> to timeouts:
> 
> {noformat}
> [2019-06-28 10:04:21,113] ERROR task [1_48] Error sending record (...) to 
> topic (...) due to org.apache.kafka.common.errors.TimeoutException: Expiring 
> 44 record(s) for (...)-48:120002 ms has passed since batch creation; No more 
> records will be sent and no more offsets will be recorded for this task.
> {noformat}
> and "Offset commit failed" errors, in all partitions:
> {noformat}
> [2019-06-28 10:04:27,705] ERROR [Consumer 
> clientId=stream-processor-0.0.1-084f2b82-849a-42b5-a787-f900bbfcb545-StreamThread-1-consumer,
>  groupId=stream-processor-0.0.1] Offset commit failed on partition 
> events-raw-63 at offset 4858803: The request timed out. 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> {noformat}
> _At this point we begin seeing error messages in one of the brokers (see 
> below, Broker logs section)._
> More error messages are shown on the `stream-processor`: 
> {noformat}
> org.apache.kafka.common.errors.TimeoutException: Timeout of 6ms 
> expired before successfully committing offsets 
> {(topic)=OffsetAndMetadata{offset=4858803, leaderEpoch=null, metadata=''}}
> {noformat}
> then hundreds of messages of the following types (one per topic-partitio) 
> intertwinned: 
> {noformat}
> [2019-06-28 10:05:23,608] WARN [Producer 
> clientId=stream-processor-0.0.1-084f2b82-849a-42b5-a787-f900bbfcb545-StreamThread-3-producer]
>  Got error produce response with correlation id 39946 on topic-partition 
> (topic)-63, retrying (2 attempts left). Error: NETWORK_EXCEPTION 
> (org.apache.kafka.clients.producer.internals.Sender)
> {noformat}
> {noformat}
> [2019-06-28 10:05:23,609] WARN [Producer 
> clientId=stream-processor-0.0.1-084f2b82-849a-42b5-a787-f900bbfcb545-StreamThread-3-producer]
>  Received invalid metadata error in produce request on partition (topic)1-59 
> due to org.apache.kafka.common.errors.NetworkException: The server 
> disconnected before a response was received.. Going to request metadata 
> update now (org.apache.kafka.clients.producer.internals.Sender)
> {noformat}
> And then: 
> {noformat}
> [2019-06-28 10:05:47,986] ERROR stream-thread 
> [stream-processor-0.0.1-084f2b82-849a-42b5-a787-f900bbfcb545-StreamThread-4] 
> Failed to commit stream task 1_57 due to the following error: 
> (org.apache.kafka.streams.processor.internals.AssignedStreamsTasks)
> 2019-06-28 10:05:47org.apache.kafka.streams.errors.StreamsException: task 
> [1_57] Abort sending since an error caught with a previous record (...) to 
> topic (...) due to org.apache.kafka.common.errors.NetworkException: The 
> server disconnected before a response was received.
> 2019-06-28 10:05:47You can increase producer parameter `retries` and 
> `retry.backoff.ms` to avoid this error.
> 2019-06-28 10:05:47 at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:133)
> {noformat}
> ...and eventually we get to the error messages: 
> {noformat}
> [2019-06-28 10:05:51,198] ERROR [Producer 
> clientId=stream-processor-0.0.1-084f2b82-849a-42b5-a787-f900bbfcb545-StreamThread-3-producer]
>  Uncaught error in kafka producer I/O thread: 

[jira] [Created] (KAFKA-8707) Zookeeper Session expired either before or while waiting for connection

2019-07-24 Thread Chethan Bheemaiah (JIRA)
Chethan Bheemaiah created KAFKA-8707:


 Summary: Zookeeper Session expired either before or while waiting 
for connection
 Key: KAFKA-8707
 URL: https://issues.apache.org/jira/browse/KAFKA-8707
 Project: Kafka
  Issue Type: Bug
  Components: zkclient
Affects Versions: 2.0.1
Reporter: Chethan Bheemaiah


Recently we had encountered an issue in one of our kafka cluster. One of the 
node went down and was not joining the kafka cluster on restart. We had 
observed Session expired error messages in server.log

Below is one message
ERROR kafka.common.ZkNodeChangeNotificationListener: Error while processing 
notification change for path = /config/changes
kafka.zookeeper.ZooKeeperClientExpiredException: Session expired either before 
or while waiting for connection
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:238)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at 
kafka.zookeeper.ZooKeeperClient.kafka$zookeeper$ZooKeeperClient$$waitUntilConnected(ZooKeeperClient.scala:226)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:220)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$waitUntilConnected$1.apply(ZooKeeperClient.scala:220)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$waitUntilConnected$1.apply(ZooKeeperClient.scala:220)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at 
kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:219)
at 
kafka.zk.KafkaZkClient.retryRequestsUntilConnected(KafkaZkClient.scala:1510)
at 
kafka.zk.KafkaZkClient.kafka$zk$KafkaZkClient$$retryRequestUntilConnected(KafkaZkClient.scala:1486)
at kafka.zk.KafkaZkClient.getChildren(KafkaZkClient.scala:585)
at 
kafka.common.ZkNodeChangeNotificationListener.kafka$common$ZkNodeChangeNotificationListener$$processNotifications(ZkNodeChangeNotificationListener.scala:82)
at 
kafka.common.ZkNodeChangeNotificationListener$ChangeNotification.process(ZkNodeChangeNotificationListener.scala:119)
at 
kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread.doWork(ZkNodeChangeNotificationListener.scala:145)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8708) Zookeeper Session expired either before or while waiting for connection

2019-07-24 Thread Chethan Bheemaiah (JIRA)
Chethan Bheemaiah created KAFKA-8708:


 Summary: Zookeeper Session expired either before or while waiting 
for connection
 Key: KAFKA-8708
 URL: https://issues.apache.org/jira/browse/KAFKA-8708
 Project: Kafka
  Issue Type: Bug
  Components: zkclient
Affects Versions: 2.0.1
Reporter: Chethan Bheemaiah


Recently we had encountered an issue in one of our kafka cluster. One of the 
node went down and was not joining the kafka cluster on restart. We had 
observed Session expired error messages in server.log

Below is one message
ERROR kafka.common.ZkNodeChangeNotificationListener: Error while processing 
notification change for path = /config/changes
kafka.zookeeper.ZooKeeperClientExpiredException: Session expired either before 
or while waiting for connection
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:238)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at 
kafka.zookeeper.ZooKeeperClient.kafka$zookeeper$ZooKeeperClient$$waitUntilConnected(ZooKeeperClient.scala:226)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:220)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$waitUntilConnected$1.apply(ZooKeeperClient.scala:220)
at 
kafka.zookeeper.ZooKeeperClient$$anonfun$waitUntilConnected$1.apply(ZooKeeperClient.scala:220)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at 
kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:219)
at 
kafka.zk.KafkaZkClient.retryRequestsUntilConnected(KafkaZkClient.scala:1510)
at 
kafka.zk.KafkaZkClient.kafka$zk$KafkaZkClient$$retryRequestUntilConnected(KafkaZkClient.scala:1486)
at kafka.zk.KafkaZkClient.getChildren(KafkaZkClient.scala:585)
at 
kafka.common.ZkNodeChangeNotificationListener.kafka$common$ZkNodeChangeNotificationListener$$processNotifications(ZkNodeChangeNotificationListener.scala:82)
at 
kafka.common.ZkNodeChangeNotificationListener$ChangeNotification.process(ZkNodeChangeNotificationListener.scala:119)
at 
kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread.doWork(ZkNodeChangeNotificationListener.scala:145)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8706) Kafka 2.3.0 Unit Test Failures on Oracle Linux - Need help debugging framework or issue.

2019-07-24 Thread Kamal Chandraprakash (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891648#comment-16891648
 ] 

Kamal Chandraprakash commented on KAFKA-8706:
-

Kafka have some transient test case failures. You can file the remaining test 
case failures:
 # DescribeConsumerGroupTest.testDescribeOffsetsOfExistingGroupWithNoMembers - 
https://issues.apache.org/jira/browse/KAFKA-7969
 # SaslSslAdminClientIntegrationTest. 
testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords - None Filed
 # UserQuotaTest.testQuotaOverrideDelete - 
https://issues.apache.org/jira/browse/KAFKA-8032
 # UserQuotaTest.testThrottledProducerConsumer - 
https://issues.apache.org/jira/browse/KAFKA-8073
 # MetricsDuringTopicCreationDeletionTest.testMetricsDuringTopicCreateDelete - 
None Filed
 # SocketServerTest.testControlPlaneRequest - None Filed

> Kafka 2.3.0 Unit Test Failures on Oracle Linux  - Need help debugging 
> framework or issue.
> -
>
> Key: KAFKA-8706
> URL: https://issues.apache.org/jira/browse/KAFKA-8706
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0
>Reporter: Chandrasekhar
>Priority: Minor
> Attachments: KafkaAUTFailures.txt
>
>
> Hi
> We have just imported KAFKA 2.3.0 source code from git repo and compiling 
> using Gradle 4.7 on Oracle VM with following info:
> [vagrant@localhost kafka-2.3.0]$ uname -a
>  Linux localhost 4.1.12-112.14.1.el7uek.x86_64 #2 SMP Fri Dec 8 18:37:23 PST 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
>  [vagrant@localhost kafka-2.3.0]$
>  
> Upon compiling (#gradle build) , there are 6 test failures at the end. Failed 
> Tests are reported as following:
> DescribeConsumerGroupTest. testDescribeOffsetsOfExistingGroupWithNoMembers 
>  SaslSslAdminClientIntegrationTest. 
> testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 
>  UserQuotaTest. testQuotaOverrideDelete 
>  UserQuotaTest. testThrottledProducerConsumer 
>  MetricsDuringTopicCreationDeletionTest. testMetricsDuringTopicCreateDelete 
>  SocketServerTest. testControlPlaneRequest
> Attached find the failures.
>  
> [^KafkaAUTFailures.txt]
>  
>  
>  We would like to know if we are missing anything in our build environment or 
> if this is a known test failures in Kafka 2.3.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8697) Kafka consumer group auto removal

2019-07-24 Thread Kamal Chandraprakash (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891642#comment-16891642
 ] 

Kamal Chandraprakash commented on KAFKA-8697:
-

[~progovoy]

[KIP-496|https://cwiki.apache.org/confluence/display/KAFKA/KIP-496%3A+Administrative+API+to+delete+consumer+offsets]
 is proposed today which adds an option in AdminClient to delete the inactive 
consumer group offsets.

> Kafka consumer group auto removal
> -
>
> Key: KAFKA-8697
> URL: https://issues.apache.org/jira/browse/KAFKA-8697
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Pavel Rogovoy
>Priority: Major
>
> Hello everyone,
> I'm new to Kafka so please be gentle with me :)
> Current issue:
> Lets say I have a consumer that consumes messages from a consumer group named 
> 'ABC' and decides to terminate. Consumer group 'ABC' will now stay there 
> hanging with zero consumers. This situation will cause monitoring tools like 
> burrow to alert on a lag for this consumer group even though my application 
> has finished its job and doesn't want to do anything and thus not in lag. 
>  
> I think it will be useful if we will add an option to create a consumer group 
> that will be automatically removed when the last consumer has terminated 
> properly and did not crashed.
>  
> Please tell me what do you think?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)