[jira] [Resolved] (KAFKA-8252) topic catalog.offer-updated.1 has an unexpected message

2019-04-17 Thread zan fu (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zan fu resolved KAFKA-8252.
---
Resolution: Not A Bug

> topic catalog.offer-updated.1 has an unexpected message
> ---
>
> Key: KAFKA-8252
> URL: https://issues.apache.org/jira/browse/KAFKA-8252
> Project: Kafka
>  Issue Type: Bug
>Reporter: zan fu
>Priority: Major
>
> refer to https://issues.apache.org/jira/browse/KAFKA-4740's



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8248) Producer may fail IllegalStateException

2019-04-17 Thread Ayusman Dikshit (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820743#comment-16820743
 ] 

Ayusman Dikshit commented on KAFKA-8248:


Does this issue ONLY apply to 2.0.0 or later versions as well?

> Producer may fail IllegalStateException
> ---
>
> Key: KAFKA-8248
> URL: https://issues.apache.org/jira/browse/KAFKA-8248
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 2.0.0
>Reporter: Matthias J. Sax
>Priority: Major
>
> In a Kafka Streams application, we observed the following log from the 
> producer:
> {quote}2019-04-17T01:58:25.898Z 17466081 [kafka-producer-network-thread | 
> client-id-enrichment-gcoint-StreamThread-7-0_10-producer] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - [Producer 
> clientId=client-id-enrichment-gcoint-StreamThread-7-0_10-producer, 
> transactionalId=application-id-enrichment-gcoint-0_10] Uncaught error in 
> kafka producer I/O thread: 
> 2019-04-17T01:58:25.898Z java.lang.IllegalStateException: Attempt to send a 
> request to node 1 which is not ready.
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:430)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:411)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:362)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:214)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
> 2019-04-17T01:58:25.898Z at java.lang.Thread.run(Thread.java:748)
> {quote}
> Later, Kafka Streams (running with EOS enabled) shuts down with a 
> `TimeoutException` that occurs during rebalance. It seem that the above error 
> results in this `TimeoutException`. However, and `IllegalStateException` seem 
> to indicate a bug in the producer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8252) topic catalog.offer-updated.1 has an unexpected message

2019-04-17 Thread zan fu (JIRA)
zan fu created KAFKA-8252:
-

 Summary: topic catalog.offer-updated.1 has an unexpected message
 Key: KAFKA-8252
 URL: https://issues.apache.org/jira/browse/KAFKA-8252
 Project: Kafka
  Issue Type: Bug
Reporter: zan fu


refer to https://issues.apache.org/jira/browse/KAFKA-4740's



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8246) refactor topic/group instance id validation condition

2019-04-17 Thread Boyang Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820708#comment-16820708
 ] 

Boyang Chen commented on KAFKA-8246:


Interesting, thanks for the info! [~sliebau] Cc [~hachikuji]

> refactor topic/group instance id validation condition
> -
>
> Key: KAFKA-8246
> URL: https://issues.apache.org/jira/browse/KAFKA-8246
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8153) Streaming application with state stores takes up to 1 hour to restart

2019-04-17 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820680#comment-16820680
 ] 

Matthias J. Sax commented on KAFKA-8153:


[~mmelsen] Any updates on this? I am wondering if you agree with my assessment 
or not?

> Streaming application with state stores takes up to 1 hour to restart
> -
>
> Key: KAFKA-8153
> URL: https://issues.apache.org/jira/browse/KAFKA-8153
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0
>Reporter: Michael Melsen
>Priority: Major
>
> We are using spring cloud stream with Kafka streams 2.0.1 and utilizing the 
> InteractiveQueryService to fetch data from the stores. There are 4 stores 
> that persist data on disk after aggregating data. The code for the topology 
> looks like this:
> {code:java}
> @Slf4j
> @EnableBinding(SensorMeasurementBinding.class)
> public class Consumer {
>   public static final String RETENTION_MS = "retention.ms";
>   public static final String CLEANUP_POLICY = "cleanup.policy";
>   @Value("${windowstore.retention.ms}")
>   private String retention;
> /**
>  * Process the data flowing in from a Kafka topic. Aggregate the data to:
>  * - 2 minute
>  * - 15 minutes
>  * - one hour
>  * - 12 hours
>  *
>  * @param stream
>  */
> @StreamListener(SensorMeasurementBinding.ERROR_SCORE_IN)
> public void process(KStream stream) {
> Map topicConfig = new HashMap<>();
> topicConfig.put(RETENTION_MS, retention);
> topicConfig.put(CLEANUP_POLICY, "delete");
> log.info("Changelog and local window store retention.ms: {} and 
> cleanup.policy: {}",
> topicConfig.get(RETENTION_MS),
> topicConfig.get(CLEANUP_POLICY));
> createWindowStore(LocalStore.TWO_MINUTES_STORE, topicConfig, stream);
> createWindowStore(LocalStore.FIFTEEN_MINUTES_STORE, topicConfig, stream);
> createWindowStore(LocalStore.ONE_HOUR_STORE, topicConfig, stream);
> createWindowStore(LocalStore.TWELVE_HOURS_STORE, topicConfig, stream);
>   }
>   private void createWindowStore(
> LocalStore localStore,
> Map topicConfig,
> KStream stream) {
> // Configure how the statestore should be materialized using the provide 
> storeName
> Materialized> materialized 
> = Materialized
> .as(localStore.getStoreName());
> // Set retention of changelog topic
> materialized.withLoggingEnabled(topicConfig);
> // Configure how windows looks like and how long data will be retained in 
> local stores
> TimeWindows configuredTimeWindows = getConfiguredTimeWindows(
> localStore.getTimeUnit(), 
> Long.parseLong(topicConfig.get(RETENTION_MS)));
> // Processing description:
> // The input data are 'samples' with key 
> :::
> // 1. With the map we add the Tag to the key and we extract the error 
> score from the data
> // 2. With the groupByKey we group  the data on the new key
> // 3. With windowedBy we split up the data in time intervals depending on 
> the provided LocalStore enum
> // 4. With reduce we determine the maximum value in the time window
> // 5. Materialized will make it stored in a table
> stream
> .map(getInstallationAssetModelAlgorithmTagKeyMapper())
> .groupByKey()
> .windowedBy(configuredTimeWindows)
> .reduce((aggValue, newValue) -> getMaxErrorScore(aggValue, 
> newValue), materialized);
>   }
>   private TimeWindows getConfiguredTimeWindows(long windowSizeMs, long 
> retentionMs) {
> TimeWindows timeWindows = TimeWindows.of(windowSizeMs);
> timeWindows.until(retentionMs);
> return timeWindows;
>   }
>   /**
>* Determine the max error score to keep by looking at the aggregated error 
> signal and
>* freshly consumed error signal
>*
>* @param aggValue
>* @param newValue
>* @return
>*/
>   private ErrorScore getMaxErrorScore(ErrorScore aggValue, ErrorScore 
> newValue) {
> if(aggValue.getErrorSignal() > newValue.getErrorSignal()) {
> return aggValue;
> }
> return newValue;
>   }
>   private KeyValueMapper KeyValue> 
> getInstallationAssetModelAlgorithmTagKeyMapper() {
> return (s, sensorMeasurement) -> new KeyValue<>(s + "::" + 
> sensorMeasurement.getT(),
> new ErrorScore(sensorMeasurement.getTs(), 
> sensorMeasurement.getE(), sensorMeasurement.getO()));
>   }
> }
> {code}
> So we are materializing aggregated data to four different stores after 
> determining the max value within a specific window for a specific key. Please 
> note that retention which is set to two months of data and the clean up 
> policy delete. We don't compact data.
> The size of the individual state stores on disk is between 14 to 20 gb of 
> data.
> We 

[jira] [Commented] (KAFKA-8155) Update Streams system tests for 2.2.0 and 2.1.1 releases

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820671#comment-16820671
 ] 

ASF GitHub Bot commented on KAFKA-8155:
---

mjsax commented on pull request #6597: KAFKA-8155: Add 2.2.0 release to system 
tests
URL: https://github.com/apache/kafka/pull/6597
 
 
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Streams system tests for 2.2.0 and 2.1.1 releases
> 
>
> Key: KAFKA-8155
> URL: https://issues.apache.org/jira/browse/KAFKA-8155
> Project: Kafka
>  Issue Type: Task
>  Components: streams, system tests
>Reporter: John Roesler
>Assignee: Matthias J. Sax
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8155) Update Streams system tests for 2.2.0 and 2.1.1 releases

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820666#comment-16820666
 ] 

ASF GitHub Bot commented on KAFKA-8155:
---

mjsax commented on pull request #6596:  KAFKA-8155: Add 2.1.1 release to system 
tests
URL: https://github.com/apache/kafka/pull/6596
 
 
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Streams system tests for 2.2.0 and 2.1.1 releases
> 
>
> Key: KAFKA-8155
> URL: https://issues.apache.org/jira/browse/KAFKA-8155
> Project: Kafka
>  Issue Type: Task
>  Components: streams, system tests
>Reporter: John Roesler
>Assignee: Matthias J. Sax
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8251) Consider different logging for purgeData

2019-04-17 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8251:
--

 Summary: Consider different logging for purgeData
 Key: KAFKA-8251
 URL: https://issues.apache.org/jira/browse/KAFKA-8251
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Matthias J. Sax


As reported in this SO question 
[https://stackoverflow.com/questions/55724259/kafka-streams-floods-kafka-logs/55738257#55738257]
 each time data is purged from a topic a log message is written:

> {{[2019-04-17 09:06:16,541] INFO [Log 
>partition=my-application-KSTREAM-AGGREGATE-STATE-STORE-76-repartition-0,
> dir=/opt/kafka/data/logs] Incrementing log start offset to 316423 
>(kafka.log.Log) [2019-04-17 09:06:16,545] INFO [Log 
>partition=my-application-KSTREAM-AGGREGATE-STATE-STORE-33-repartition-2,
> dir=/opt/kafka/data/logs] Incrementing log start offset to 3394 
>(kafka.log.Log) }}

While this is useful for regular log truncation, it seems less useful for 
explicit purge-data requests. Kafka Streams uses the purge data API on 
repartition topics aggressively and this can result to verbose logging.

I don't know this part of the code, but maybe it's possible to distinguish both 
cases and only log on time/size based truncation at INFO level and reduce log 
level for purge-data request to TRACE?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4212) Add a key-value store that is a TTL persistent cache

2019-04-17 Thread Chris Toomey (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820644#comment-16820644
 ] 

Chris Toomey commented on KAFKA-4212:
-

Our use case is we're using RocksDB as a key/value store cache of a topic and 
we need the cache to delete the messages when the underlying topic does (via 
configured retention time). One way to achieve this would be to have the broker 
publish tombstone records for the deleted messages or provide a means for some 
other entity to do so.

> Add a key-value store that is a TTL persistent cache
> 
>
> Key: KAFKA-4212
> URL: https://issues.apache.org/jira/browse/KAFKA-4212
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Elias Levy
>Priority: Major
>  Labels: api
>
> Some jobs needs to maintain as state a large set of key-values for some 
> period of time.  I.e. they need to maintain a TTL cache of values potentially 
> larger than memory. 
> Currently Kafka Streams provides non-windowed and windowed key-value stores.  
> Neither is an exact fit to this use case.  
> The {{RocksDBStore}}, a {{KeyValueStore}}, stores one value per key as 
> required, but does not support expiration.  The TTL option of RocksDB is 
> explicitly not used.
> The {{RocksDBWindowsStore}}, a {{WindowsStore}}, can expire items via segment 
> dropping, but it stores multiple items per key, based on their timestamp.  
> But this store can be repurposed as a cache by fetching the items in reverse 
> chronological order and returning the first item found.
> KAFKA-2594 introduced a fixed-capacity in-memory LRU caching store, but here 
> we desire a variable-capacity memory-overflowing TTL caching store.
> Although {{RocksDBWindowsStore}} can be repurposed as a cache, it would be 
> useful to have an official and proper TTL cache API and implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8250) Flaky Test DelegationTokenEndToEndAuthorizationTest#testProduceConsumeViaAssign

2019-04-17 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8250:
--

 Summary: Flaky Test 
DelegationTokenEndToEndAuthorizationTest#testProduceConsumeViaAssign
 Key: KAFKA-8250
 URL: https://issues.apache.org/jira/browse/KAFKA-8250
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.3.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0


[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk11/detail/kafka-trunk-jdk11/442/tests]
{quote}java.lang.AssertionError: Consumed more records than expected 
expected:<1> but was:<2>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1288)
at 
kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:460)
at 
kafka.api.EndToEndAuthorizationTest.testProduceConsumeViaAssign(EndToEndAuthorizationTest.scala:209){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7647) Flaky test LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic

2019-04-17 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820612#comment-16820612
 ] 

Matthias J. Sax commented on KAFKA-7647:


https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3564/tests

> Flaky test 
> LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic
> -
>
> Key: KAFKA-7647
> URL: https://issues.apache.org/jira/browse/KAFKA-7647
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core, unit tests
>Affects Versions: 2.3.0
>Reporter: Dong Lin
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0
>
>
> {code}
> kafka.log.LogCleanerParameterizedIntegrationTest >
> testCleansCombinedCompactAndDeleteTopic[3] FAILED
>     java.lang.AssertionError: Contents of the map shouldn't change
> expected: (340,340), 5 -> (345,345), 10 -> (350,350), 14 ->
> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353),
> 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 ->
> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 ->
> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but
> was: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354),
> 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 ->
> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 ->
> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 99 ->
> (299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 ->
> (355,355))>
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.failNotEquals(Assert.java:834)
>         at org.junit.Assert.assertEquals(Assert.java:118)
>         at
> kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic(LogCleanerParameterizedIntegrationTest.scala:129)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8248) Producer may fail IllegalStateException

2019-04-17 Thread Jason Gustafson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820597#comment-16820597
 ] 

Jason Gustafson commented on KAFKA-8248:


Looking at the code, I think the problem might be that we don't update the time 
before calling `client.send()` inside `maybeSendTransactionalRequest`. 
Basically `NetworkClient.isReady` might return a different value at different 
times, so if the latest time is not used, it might return incorrectly.

> Producer may fail IllegalStateException
> ---
>
> Key: KAFKA-8248
> URL: https://issues.apache.org/jira/browse/KAFKA-8248
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 2.0.0
>Reporter: Matthias J. Sax
>Priority: Major
>
> In a Kafka Streams application, we observed the following log from the 
> producer:
> {quote}2019-04-17T01:58:25.898Z 17466081 [kafka-producer-network-thread | 
> client-id-enrichment-gcoint-StreamThread-7-0_10-producer] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - [Producer 
> clientId=client-id-enrichment-gcoint-StreamThread-7-0_10-producer, 
> transactionalId=application-id-enrichment-gcoint-0_10] Uncaught error in 
> kafka producer I/O thread: 
> 2019-04-17T01:58:25.898Z java.lang.IllegalStateException: Attempt to send a 
> request to node 1 which is not ready.
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:430)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:411)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:362)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:214)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
> 2019-04-17T01:58:25.898Z at java.lang.Thread.run(Thread.java:748)
> {quote}
> Later, Kafka Streams (running with EOS enabled) shuts down with a 
> `TimeoutException` that occurs during rebalance. It seem that the above error 
> results in this `TimeoutException`. However, and `IllegalStateException` seem 
> to indicate a bug in the producer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8185) Controller becomes stale and not able to failover the leadership for the partitions

2019-04-17 Thread Dhruvil Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820573#comment-16820573
 ] 

Dhruvil Shah edited comment on KAFKA-8185 at 4/17/19 11:45 PM:
---

I investigated this issue with Kang and after looking through older logs, the 
most logical explanation is that the controller got into an invalid state after 
the topic znode was deleted directly from ZK. This is not an expected scenario 
as we expect users to use the `delete-topics` command, AdminClient or the 
DeleteTopicsRequest to delete a topic.


was (Author: dhruvilshah):
I investigated this issue with Kang and after looking through older logs, the 
most logical explanation is that the controller got into an invalid state after 
the topic znode was deleted from ZK. This is not an expected scenario as we 
expect users to use the `delete-topics` command, AdminClient or the 
DeleteTopicsRequest to delete a topic.

> Controller becomes stale and not able to failover the leadership for the 
> partitions
> ---
>
> Key: KAFKA-8185
> URL: https://issues.apache.org/jira/browse/KAFKA-8185
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.1.1
>Reporter: Kang H Lee
>Priority: Critical
> Attachments: broker12.zip, broker9.zip, zookeeper.zip
>
>
> Description:
> After broker 9 went offline, all partitions led by it went offline. The 
> controller attempted to move leadership but ran into an exception while doing 
> so:
> {code:java}
> // [2019-03-26 01:23:34,114] ERROR [PartitionStateMachine controllerId=12] 
> Error while moving some partitions to OnlinePartition state 
> (kafka.controller.PartitionStateMachine)
> java.util.NoSuchElementException: key not found: me-test-1
> at scala.collection.MapLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:59)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:65)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$14.apply(PartitionStateMachine.scala:202)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$14.apply(PartitionStateMachine.scala:202)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> kafka.controller.PartitionStateMachine.initializeLeaderAndIsrForPartitions(PartitionStateMachine.scala:202)
> at 
> kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:167)
> at 
> kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116)
> at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:106)
> at 
> kafka.controller.KafkaController.kafka$controller$KafkaController$$onReplicasBecomeOffline(KafkaController.scala:437)
> at 
> kafka.controller.KafkaController.kafka$controller$KafkaController$$onBrokerFailure(KafkaController.scala:405)
> at 
> kafka.controller.KafkaController$BrokerChange$.process(KafkaController.scala:1246)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:69)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:68)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> {code}
> The controller was unable to move leadership of partitions led by broker 9 as 
> a result. It's worth noting that the controller ran into the same exception 
> when the broker came back up online. The controller thinks `me-test-1` is a 
> new partition and when attempting to transition it to an online partition, it 
> is unable to retrieve its replica assignment from 
> ControllerContext#partitionReplicaAssignment. I need to look through the code 
> to figure out if there's a race condition or situations where we remove the 
> partition from ControllerContext#partitionReplicaAssignment but might still 
> leave it in PartitionStateMachine#partitionState.
> They had to change the controller to recover from the offline status.
> Sequential event:

[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820588#comment-16820588
 ] 

ASF GitHub Bot commented on KAFKA-7652:
---

guozhangwang commented on pull request #6448: KAFKA-7652: Restrict range of 
fetch/findSessions in cache
URL: https://github.com/apache/kafka/pull/6448
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -
>
> Key: KAFKA-7652
> URL: https://issues.apache.org/jira/browse/KAFKA-7652
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>Reporter: Jonathan Gordon
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
> Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, 
> 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8185) Controller becomes stale and not able to failover the leadership for the partitions

2019-04-17 Thread Dhruvil Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820573#comment-16820573
 ] 

Dhruvil Shah edited comment on KAFKA-8185 at 4/17/19 11:42 PM:
---

I investigated this issue with Kang and after looking through older logs, the 
most logical explanation is that the controller got into an invalid state after 
the topic znode was deleted from ZK. This is not an expected scenario as we 
expect users to use the `delete-topics` command, AdminClient or the 
DeleteTopicsRequest to delete a topic.


was (Author: dhruvilshah):
This is not a typically expected scenario and would only happen when the topic 
znode is deleted directly from ZK.

> Controller becomes stale and not able to failover the leadership for the 
> partitions
> ---
>
> Key: KAFKA-8185
> URL: https://issues.apache.org/jira/browse/KAFKA-8185
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.1.1
>Reporter: Kang H Lee
>Priority: Critical
> Attachments: broker12.zip, broker9.zip, zookeeper.zip
>
>
> Description:
> After broker 9 went offline, all partitions led by it went offline. The 
> controller attempted to move leadership but ran into an exception while doing 
> so:
> {code:java}
> // [2019-03-26 01:23:34,114] ERROR [PartitionStateMachine controllerId=12] 
> Error while moving some partitions to OnlinePartition state 
> (kafka.controller.PartitionStateMachine)
> java.util.NoSuchElementException: key not found: me-test-1
> at scala.collection.MapLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:59)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:65)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$14.apply(PartitionStateMachine.scala:202)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$14.apply(PartitionStateMachine.scala:202)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> kafka.controller.PartitionStateMachine.initializeLeaderAndIsrForPartitions(PartitionStateMachine.scala:202)
> at 
> kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:167)
> at 
> kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116)
> at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:106)
> at 
> kafka.controller.KafkaController.kafka$controller$KafkaController$$onReplicasBecomeOffline(KafkaController.scala:437)
> at 
> kafka.controller.KafkaController.kafka$controller$KafkaController$$onBrokerFailure(KafkaController.scala:405)
> at 
> kafka.controller.KafkaController$BrokerChange$.process(KafkaController.scala:1246)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:69)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:68)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> {code}
> The controller was unable to move leadership of partitions led by broker 9 as 
> a result. It's worth noting that the controller ran into the same exception 
> when the broker came back up online. The controller thinks `me-test-1` is a 
> new partition and when attempting to transition it to an online partition, it 
> is unable to retrieve its replica assignment from 
> ControllerContext#partitionReplicaAssignment. I need to look through the code 
> to figure out if there's a race condition or situations where we remove the 
> partition from ControllerContext#partitionReplicaAssignment but might still 
> leave it in PartitionStateMachine#partitionState.
> They had to change the controller to recover from the offline status.
> Sequential event:
> * Broker 9 got restated in between : 2019-03-26 01:22:54,236 - 2019-03-26 
> 01:27:30,967: This was unclean shutdown.
> * From 2019-03-26 01:27:30,967, broker 9 was rebuilding indexes. Broker 9 
> wasn't able to process data at this 

[jira] [Created] (KAFKA-8249) partition reassignment may never finish if topic deletion completes first

2019-04-17 Thread xiongqi wu (JIRA)
xiongqi wu created KAFKA-8249:
-

 Summary: partition reassignment may never finish if topic deletion 
completes first 
 Key: KAFKA-8249
 URL: https://issues.apache.org/jira/browse/KAFKA-8249
 Project: Kafka
  Issue Type: Bug
Reporter: xiongqi wu
Assignee: xiongqi wu


kafka allows topic deletion to complete successfully when there are pending 
partition reassignments of the same topics. (if topic deletion request comes 
after partition reassignment). 

This leads several issues: 1) pending partition reassignments of deleted topic 
never complete because the topic is deleted. 2) onPartitionReassignment -> 
updateAssignedReplicasForPartition will throw out IllegalStateException for 
non-existing node. This in turns causes controller not to resume topic deletion 
for online broker and also fail to register broker notification handler (etc.) 
during onBrokerStartup. 

To fix, we need to clean up pending partition reassignment during topic 
deletion.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8185) Controller becomes stale and not able to failover the leadership for the partitions

2019-04-17 Thread Dhruvil Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhruvil Shah resolved KAFKA-8185.
-
Resolution: Not A Problem

This is not a typically expected scenario and would only happen when the topic 
znode is deleted directly from ZK.

> Controller becomes stale and not able to failover the leadership for the 
> partitions
> ---
>
> Key: KAFKA-8185
> URL: https://issues.apache.org/jira/browse/KAFKA-8185
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.1.1
>Reporter: Kang H Lee
>Priority: Critical
> Attachments: broker12.zip, broker9.zip, zookeeper.zip
>
>
> Description:
> After broker 9 went offline, all partitions led by it went offline. The 
> controller attempted to move leadership but ran into an exception while doing 
> so:
> {code:java}
> // [2019-03-26 01:23:34,114] ERROR [PartitionStateMachine controllerId=12] 
> Error while moving some partitions to OnlinePartition state 
> (kafka.controller.PartitionStateMachine)
> java.util.NoSuchElementException: key not found: me-test-1
> at scala.collection.MapLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:59)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:65)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$14.apply(PartitionStateMachine.scala:202)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$14.apply(PartitionStateMachine.scala:202)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> kafka.controller.PartitionStateMachine.initializeLeaderAndIsrForPartitions(PartitionStateMachine.scala:202)
> at 
> kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:167)
> at 
> kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116)
> at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:106)
> at 
> kafka.controller.KafkaController.kafka$controller$KafkaController$$onReplicasBecomeOffline(KafkaController.scala:437)
> at 
> kafka.controller.KafkaController.kafka$controller$KafkaController$$onBrokerFailure(KafkaController.scala:405)
> at 
> kafka.controller.KafkaController$BrokerChange$.process(KafkaController.scala:1246)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:69)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:68)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> {code}
> The controller was unable to move leadership of partitions led by broker 9 as 
> a result. It's worth noting that the controller ran into the same exception 
> when the broker came back up online. The controller thinks `me-test-1` is a 
> new partition and when attempting to transition it to an online partition, it 
> is unable to retrieve its replica assignment from 
> ControllerContext#partitionReplicaAssignment. I need to look through the code 
> to figure out if there's a race condition or situations where we remove the 
> partition from ControllerContext#partitionReplicaAssignment but might still 
> leave it in PartitionStateMachine#partitionState.
> They had to change the controller to recover from the offline status.
> Sequential event:
> * Broker 9 got restated in between : 2019-03-26 01:22:54,236 - 2019-03-26 
> 01:27:30,967: This was unclean shutdown.
> * From 2019-03-26 01:27:30,967, broker 9 was rebuilding indexes. Broker 9 
> wasn't able to process data at this moment.
> * At 2019-03-26 01:29:36,741, broker 9 was starting to load replica.
> * [2019-03-26 01:29:36,202] ERROR [KafkaApi-9] Number of alive brokers '0' 
> does not meet the required replication factor '3' for the offsets topic 
> (configured via 'offsets.topic.replication.factor'). This error can be 
> ignored if the cluster is starting up and not all brokers are up yet. 
> (kafka.server.KafkaApis)
> * At 2019-03-26 

[jira] [Updated] (KAFKA-8248) Producer may fail IllegalStateException

2019-04-17 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8248:
---
Description: 
In a Kafka Streams application, we observed the following log from the producer:
{quote}2019-04-17T01:58:25.898Z 17466081 [kafka-producer-network-thread | 
client-id-enrichment-gcoint-StreamThread-7-0_10-producer] ERROR 
org.apache.kafka.clients.producer.internals.Sender - [Producer 
clientId=client-id-enrichment-gcoint-StreamThread-7-0_10-producer, 
transactionalId=application-id-enrichment-gcoint-0_10] Uncaught error in kafka 
producer I/O thread: 
2019-04-17T01:58:25.898Z java.lang.IllegalStateException: Attempt to send a 
request to node 1 which is not ready.
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:430)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:411)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:362)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:214)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
2019-04-17T01:58:25.898Z at java.lang.Thread.run(Thread.java:748)
{quote}
Later, Kafka Streams (running with EOS enabled) shuts down with a 
`TimeoutException` that occurs during rebalance. It seem that the above error 
results in this `TimeoutException`. However, and `IllegalStateException` seem 
to indicate a bug in the producer.

  was:
In a Kafka Streams application, we observed the following log from the producer:
{quote}2019-04-17T01:58:25.898Z 17466081 [kafka-producer-network-thread | 
client-id-enrichment-gcoint-StreamThread-7-0_10-producer] ERROR 
org.apache.kafka.clients.producer.internals.Sender - [Producer 
clientId=client-id-enrichment-gcoint-StreamThread-7-0_10-producer, 
transactionalId=application-id-enrichment-gcoint-0_10] Uncaught error in kafka 
producer I/O thread: 
2019-04-17T01:58:25.898Z java.lang.IllegalStateException: Attempt to send a 
request to node 1 which is not ready.
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:430)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:411)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:362)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:214)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
2019-04-17T01:58:25.898Z at java.lang.Thread.run(Thread.java:748)
{quote}
Later, Kafka Streams shuts down with a `TimeoutException` that occurs during 
rebalance. It seem that the above error results in this `TimeoutException`. 
However, and `IllegalStateException` seem to indicate a bug in the producer.


> Producer may fail IllegalStateException
> ---
>
> Key: KAFKA-8248
> URL: https://issues.apache.org/jira/browse/KAFKA-8248
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 2.0.0
>Reporter: Matthias J. Sax
>Priority: Major
>
> In a Kafka Streams application, we observed the following log from the 
> producer:
> {quote}2019-04-17T01:58:25.898Z 17466081 [kafka-producer-network-thread | 
> client-id-enrichment-gcoint-StreamThread-7-0_10-producer] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - [Producer 
> clientId=client-id-enrichment-gcoint-StreamThread-7-0_10-producer, 
> transactionalId=application-id-enrichment-gcoint-0_10] Uncaught error in 
> kafka producer I/O thread: 
> 2019-04-17T01:58:25.898Z java.lang.IllegalStateException: Attempt to send a 
> request to node 1 which is not ready.
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:430)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:411)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:362)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:214)
> 2019-04-17T01:58:25.898Z at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
> 2019-04-17T01:58:25.898Z at java.lang.Thread.run(Thread.java:748)
> {quote}
> Later, Kafka Streams (running with EOS enabled) shuts down with a 
> `TimeoutException` that occurs during rebalance. It seem that the above error 
> results in this `TimeoutException`. However, and `IllegalStateException` seem 
> to indicate a bug in the producer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8248) Producer may fail IllegalStateException

2019-04-17 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8248:
--

 Summary: Producer may fail IllegalStateException
 Key: KAFKA-8248
 URL: https://issues.apache.org/jira/browse/KAFKA-8248
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 2.0.0
Reporter: Matthias J. Sax


In a Kafka Streams application, we observed the following log from the producer:
{quote}2019-04-17T01:58:25.898Z 17466081 [kafka-producer-network-thread | 
client-id-enrichment-gcoint-StreamThread-7-0_10-producer] ERROR 
org.apache.kafka.clients.producer.internals.Sender - [Producer 
clientId=client-id-enrichment-gcoint-StreamThread-7-0_10-producer, 
transactionalId=application-id-enrichment-gcoint-0_10] Uncaught error in kafka 
producer I/O thread: 
2019-04-17T01:58:25.898Z java.lang.IllegalStateException: Attempt to send a 
request to node 1 which is not ready.
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:430)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:411)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:362)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:214)
2019-04-17T01:58:25.898Z at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
2019-04-17T01:58:25.898Z at java.lang.Thread.run(Thread.java:748)
{quote}
Later, Kafka Streams shuts down with a `TimeoutException` that occurs during 
rebalance. It seem that the above error results in this `TimeoutException`. 
However, and `IllegalStateException` seem to indicate a bug in the producer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5998) /.checkpoint.tmp Not found exception

2019-04-17 Thread Patrik Kleindl (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820523#comment-16820523
 ] 

Patrik Kleindl commented on KAFKA-5998:
---

[~guozhang] Could you please share the direction(s) you are currently 
investigating?

Is the problem only with the file or is the whole directory affected? And if 
the latter, how come this does not impact stream processing?

 

> /.checkpoint.tmp Not found exception
> 
>
> Key: KAFKA-5998
> URL: https://issues.apache.org/jira/browse/KAFKA-5998
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0, 0.11.0.1, 2.1.1
>Reporter: Yogesh BG
>Priority: Critical
> Attachments: 5998.v1.txt, 5998.v2.txt, Topology.txt, exc.txt, 
> props.txt, streams.txt
>
>
> I have one kafka broker and one kafka stream running... I am running its 
> since two days under load of around 2500 msgs per second.. On third day am 
> getting below exception for some of the partitions, I have 16 partitions only 
> 0_0 and 0_1 gives this error
> {{09:43:25.955 [ks_0_inst-StreamThread-6] WARN  
> o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint:
> java.io.FileNotFoundException: 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint.tmp (No such file or 
> directory)
> at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:221) 
> ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:171) 
> ~[na:1.7.0_111]
> at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:201)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:260)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:254)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:322)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:415)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:314)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:700)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:683)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:523)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> 09:43:25.974 [ks_0_inst-StreamThread-15] WARN  
> o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to 
> /data/kstreams/rtp-kafkastreams/0_0/.checkpoint:
> java.io.FileNotFoundException: 
> /data/kstreams/rtp-kafkastreams/0_0/.checkpoint.tmp (No such file or 
> directory)
> at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:221) 
> ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:171) 
> ~[na:1.7.0_111]
> at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> 

[jira] [Commented] (KAFKA-6520) When a Kafka Stream can't communicate with the server, it's Status stays RUNNING

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820488#comment-16820488
 ] 

ASF GitHub Bot commented on KAFKA-6520:
---

ConcurrencyPractitioner commented on pull request #6594: [KAFKA-6520] Add 
DISCONNECTED state to Kafka Streams
URL: https://github.com/apache/kafka/pull/6594
 
 
   We wish to add a DISCONNECTED state to Kafka Streams. Please see KIP-457 for 
details.
   
https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> When a Kafka Stream can't communicate with the server, it's Status stays 
> RUNNING
> 
>
> Key: KAFKA-6520
> URL: https://issues.apache.org/jira/browse/KAFKA-6520
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Michael Kohout
>Assignee: Milind Jain
>Priority: Major
>  Labels: newbie, user-experience
>
> When you execute the following scenario the application is always in RUNNING 
> state
>   
>  1)start kafka
>  2)start app, app connects to kafka and starts processing
>  3)kill kafka(stop docker container)
>  4)the application doesn't give any indication that it's no longer 
> connected(Stream State is still RUNNING, and the uncaught exception handler 
> isn't invoked)
>   
>   
>  It would be useful if the Stream State had a DISCONNECTED status.
>   
> See 
> [this|https://groups.google.com/forum/#!topic/confluent-platform/nQh2ohgdrIQ] 
> for a discussion from the google user forum.  
> [This|https://issues.apache.org/jira/browse/KAFKA-4564] is a link to a 
> related issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8147) Add changelog topic configuration to KTable suppress

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820474#comment-16820474
 ] 

ASF GitHub Bot commented on KAFKA-8147:
---

mjduijn commented on pull request #6593: [WIP] KAFKA-8147 Add changelog topic 
configuration to KTable suppress
URL: https://github.com/apache/kafka/pull/6593
 
 
   KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress
   Jira: https://issues.apache.org/jira/browse/KAFKA-8147
   
   Have not had time to do a full implementation, still lacking: 
   * Tests
   * Proper handling of type topicConfig type erasure by `BufferConfigInternal`
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add changelog topic configuration to KTable suppress
> 
>
> Key: KAFKA-8147
> URL: https://issues.apache.org/jira/browse/KAFKA-8147
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Maarten
>Assignee: Maarten
>Priority: Minor
>  Labels: needs-kip
>
> The streams DSL does not provide a way to configure the changelog topic 
> created by KTable.suppress.
> From the perspective of an external user this could be implemented similar to 
> the configuration of aggregate + materialized, i.e., 
> {code:java}
> changelogTopicConfigs = // Configs
> materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs)
> ..
> KGroupedStream.aggregate(..,materialized)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8241) Dynamic update of keystore fails on listener without truststore

2019-04-17 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-8241.
---
   Resolution: Fixed
 Reviewer: Manikumar
Fix Version/s: 2.1.2

> Dynamic update of keystore fails on listener without truststore
> ---
>
> Key: KAFKA-8241
> URL: https://issues.apache.org/jira/browse/KAFKA-8241
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1.1, 2.0.1, 2.2.0, 2.1.1
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.3.0, 2.1.2, 2.2.1
>
>
> Validation of dynamically updated keystores and truststores assumes that both 
> are present. On brokers with only keystores and no truststore configured, 
> dynamic update fails with NPE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-04-17 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-7965:
---
Fix Version/s: (was: 2.2.1)

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 1.1.1, 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: huxihx
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-04-17 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-7965.

Resolution: Fixed

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 1.1.1, 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: huxihx
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820445#comment-16820445
 ] 

ASF GitHub Bot commented on KAFKA-7965:
---

hachikuji commented on pull request #6557: KAFKA-7965: Fix 
testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
URL: https://github.com/apache/kafka/pull/6557
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 1.1.1, 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: huxihx
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8155) Update Streams system tests for 2.2.0 and 2.1.1 releases

2019-04-17 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-8155:
--

Assignee: Matthias J. Sax  (was: John Roesler)

> Update Streams system tests for 2.2.0 and 2.1.1 releases
> 
>
> Key: KAFKA-8155
> URL: https://issues.apache.org/jira/browse/KAFKA-8155
> Project: Kafka
>  Issue Type: Task
>  Components: streams, system tests
>Reporter: John Roesler
>Assignee: Matthias J. Sax
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8025) Flaky Test RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest#shouldForwardAllDbOptionsCalls

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820402#comment-16820402
 ] 

ASF GitHub Bot commented on KAFKA-8025:
---

bbejeck commented on pull request #6370: KAFKA-8025: Update regex to allow any 
chars after ":"
URL: https://github.com/apache/kafka/pull/6370
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test 
> RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest#shouldForwardAllDbOptionsCalls
> 
>
> Key: KAFKA-8025
> URL: https://issues.apache.org/jira/browse/KAFKA-8025
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.3.0
>Reporter: Konstantine Karantasis
>Assignee: Bill Bejeck
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0
>
>
> At least one occurence where the following unit test case failed on a jenkins 
> job that didn't involve any related changes. 
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/2783/consoleFull]
> I have not been able to reproduce it locally on Linux. (For instance 20 
> consecutive runs of this class pass all test cases)
> {code:java}
> 14:06:13 
> org.apache.kafka.streams.state.internals.RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest
>  > shouldForwardAllDbOptionsCalls STARTED 14:06:14 
> org.apache.kafka.streams.state.internals.RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest.shouldForwardAllDbOptionsCalls
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/streams/build/reports/testOutput/org.apache.kafka.streams.state.internals.RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest.shouldForwardAllDbOptionsCalls.test.stdout
>  14:06:14 14:06:14 
> org.apache.kafka.streams.state.internals.RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest
>  > shouldForwardAllDbOptionsCalls FAILED 14:06:14     
> java.lang.AssertionError: 14:06:14     Expected: a string matching the 
> pattern 'Unexpected method call DBOptions\.baseBackgroundCompactions((.* 
> 14:06:14     *)*):' 14:06:14          but: was "Unexpected method call 
> DBOptions.baseBackgroundCompactions():\n    DBOptions.close(): expected: 3, 
> actual: 0" 14:06:14         at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) 14:06:14         
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6) 14:06:14       
>   at 
> org.apache.kafka.streams.state.internals.RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest.verifyDBOptionsMethodCall(RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest.java:121)
>  14:06:14         at 
> org.apache.kafka.streams.state.internals.RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest.shouldForwardAllDbOptionsCalls(RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest.java:101)
>  14:06:14 14:06:14 
> org.apache.kafka.streams.state.internals.RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest
>  > shouldForwardAllColumnFamilyCalls STARTED 14:06:14 14:06:14 
> org.apache.kafka.streams.state.internals.RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest
>  > shouldForwardAllColumnFamilyCalls PASSED
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8231) Expansion of ConnectClusterState interface

2019-04-17 Thread Chris Egerton (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton updated KAFKA-8231:
-
Fix Version/s: (was: 2.3.0)

> Expansion of ConnectClusterState interface
> --
>
> Key: KAFKA-8231
> URL: https://issues.apache.org/jira/browse/KAFKA-8231
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
>
> This covers [KIP-454: Expansion of the ConnectClusterState 
> interface|https://cwiki.apache.org/confluence/display/KAFKA/KIP-454%3A+Expansion+of+the+ConnectClusterState+interface]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8215) Limit memory usage of RocksDB

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820353#comment-16820353
 ] 

ASF GitHub Bot commented on KAFKA-8215:
---

ableegoldman commented on pull request #6560: KAFKA-8215: Pt I. Share block 
cache between instances
URL: https://github.com/apache/kafka/pull/6560
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Limit memory usage of RocksDB
> -
>
> Key: KAFKA-8215
> URL: https://issues.apache.org/jira/browse/KAFKA-8215
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> The memory usage of Streams is currently unbounded in part because of 
> RocksDB, which consumes memory on a per-instance basis. Each instance (ie 
> each persistent state store) will have its own write buffer, index blocks, 
> and block cache. The size of these can be configured individually, but there 
> is currently no way for a Streams app to limit the total memory available 
> across instances. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8241) Dynamic update of keystore fails on listener without truststore

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820322#comment-16820322
 ] 

ASF GitHub Bot commented on KAFKA-8241:
---

rajinisivaram commented on pull request #6585: KAFKA-8241; Handle configs 
without truststore for broker keystore update
URL: https://github.com/apache/kafka/pull/6585
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Dynamic update of keystore fails on listener without truststore
> ---
>
> Key: KAFKA-8241
> URL: https://issues.apache.org/jira/browse/KAFKA-8241
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1.1, 2.0.1, 2.2.0, 2.1.1
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.3.0, 2.2.1
>
>
> Validation of dynamically updated keystores and truststores assumes that both 
> are present. On brokers with only keystores and no truststore configured, 
> dynamic update fails with NPE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3539) KafkaProducer.send() may block even though it returns the Future

2019-04-17 Thread Richard Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Yu updated KAFKA-3539:
--
Labels: needs-discussion needs-kip  (was: )

> KafkaProducer.send() may block even though it returns the Future
> 
>
> Key: KAFKA-3539
> URL: https://issues.apache.org/jira/browse/KAFKA-3539
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Oleg Zhurakousky
>Priority: Critical
>  Labels: needs-discussion, needs-kip
>
> You can get more details from the us...@kafka.apache.org by searching on the 
> thread with the subject "KafkaProducer block on send".
> The bottom line is that method that returns Future must never block, since it 
> essentially violates the Future contract as it was specifically designed to 
> return immediately passing control back to the user to check for completion, 
> cancel etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8247) Duplicate error handling in kafka-server-start.sh and actual Kafka class

2019-04-17 Thread JIRA
Sönke Liebau created KAFKA-8247:
---

 Summary: Duplicate error handling in kafka-server-start.sh and 
actual Kafka class
 Key: KAFKA-8247
 URL: https://issues.apache.org/jira/browse/KAFKA-8247
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.1
Reporter: Sönke Liebau


There is some duplication of error handling for command line parameters that 
are passed into kafka-server-start.sh

 

The shell script prints an error, if no arguments are passed in, effectively 
causing the same check in 
[Kafka|https://github.com/apache/kafka/blob/92db08cba582668d77160b0c2853efd45a1b809b/core/src/main/scala/kafka/Kafka.scala#L43]
 to never be triggered, unless the only option that is specified is -daemon, 
which would be removed before passing arguments to the java class.

 

While not in any way critical I don't think that this is intended behavior. I 
think we should remove the extra check in kafka-server-start.sh and leave 
argument handling up to the Kafka class.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7937) Flaky Test ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup

2019-04-17 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820256#comment-16820256
 ] 

Matthias J. Sax commented on KAFKA-7937:


[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3562/tests]

> Flaky Test ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup
> 
>
> Key: KAFKA-7937
> URL: https://issues.apache.org/jira/browse/KAFKA-7937
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, clients, unit tests
>Affects Versions: 2.2.0, 2.1.1, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Gwen Shapira
>Priority: Critical
> Fix For: 2.3.0, 2.1.2, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/19/pipeline
> {quote}kafka.admin.ResetConsumerGroupOffsetTest > 
> testResetOffsetsNotExistingGroup FAILED 
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available. at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>  at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService.resetOffsets(ConsumerGroupCommand.scala:306)
>  at 
> kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup(ResetConsumerGroupOffsetTest.scala:89)
>  Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: 
> The coordinator is not available.{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8041) Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll

2019-04-17 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820255#comment-16820255
 ] 

Matthias J. Sax commented on KAFKA-8041:


One more: 
[https://builds.apache.org/blue/organizations/jenkins/kafka-2.0-jdk8/detail/kafka-2.0-jdk8/249/tests]

> Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll
> -
>
> Key: KAFKA-8041
> URL: https://issues.apache.org/jira/browse/KAFKA-8041
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.0.1, 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.0.2, 2.3.0
>
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-2.0-jdk8/detail/kafka-2.0-jdk8/236/tests]
> {quote}java.lang.AssertionError: Expected some messages
> at kafka.utils.TestUtils$.fail(TestUtils.scala:357)
> at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:787)
> at 
> kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:189)
> at 
> kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63){quote}
> STDOUT
> {quote}[2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, 
> leaderId=0, fetcherId=0] Error for partition topic-6 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-10 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-4 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-8 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-2 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:45:00,248] ERROR Error while rolling log segment for topic-0 
> in dir 
> /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216
>  (kafka.server.LogDirFailureChannel:76)
> java.io.FileNotFoundException: 
> /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216/topic-0/.index
>  (Not a directory)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:121)
> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> at kafka.log.AbstractIndex.resize(AbstractIndex.scala:115)
> at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:184)
> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:184)
> at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501)
> at kafka.log.Log.$anonfun$roll$8(Log.scala:1520)
> at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1520)
> at scala.Option.foreach(Option.scala:257)
> at kafka.log.Log.$anonfun$roll$2(Log.scala:1520)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1881)
> at kafka.log.Log.roll(Log.scala:1484)
> at 
> kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:154)
> at 
> kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> 

[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-04-17 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820248#comment-16820248
 ] 

Matthias J. Sax commented on KAFKA-7965:


[https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/20974/testReport/junit/kafka.api/ConsumerBounceTest/testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup/]

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 1.1.1, 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: huxihx
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6455) Improve timestamp propagation at DSL level

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820247#comment-16820247
 ] 

ASF GitHub Bot commented on KAFKA-6455:
---

mjsax commented on pull request #6565: KAFKA-6455: KStream-KStream join should 
set max timestamp for result record
URL: https://github.com/apache/kafka/pull/6565
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve timestamp propagation at DSL level
> --
>
> Key: KAFKA-6455
> URL: https://issues.apache.org/jira/browse/KAFKA-6455
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: needs-kip
>
> At DSL level, we inherit the timestamp propagation "contract" from the 
> Processor API. This contract in not optimal at DSL level, and we should 
> define a DSL level contract that matches the semantics of the corresponding 
> DSL operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8187) State store record loss across multiple reassignments when using standby tasks

2019-04-17 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-8187:
--

Assignee: Matthias J. Sax  (was: Bill Bejeck)

> State store record loss across multiple reassignments when using standby tasks
> --
>
> Key: KAFKA-8187
> URL: https://issues.apache.org/jira/browse/KAFKA-8187
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.1
>Reporter: William Greer
>Assignee: Matthias J. Sax
>Priority: Major
>
> Overview:
> There is a race condition that can cause a partitioned state store to be 
> missing records up to an offset when using standby tasks.
> When a reassignment occurs and a task is migrated to a StandbyTask in another 
> StreamThread/TaskManager on the same JVM, there can be lock contention that 
> prevents the StandbyTask on the currently assigned StreamThread from 
> acquiring the lock and to not retry acquiring the lock because all of the 
> active StreamTasks are running for that StreamThread. If the StandbyTask does 
> not acquire the lock before the StreamThread enters into the RUNNING state, 
> then the StandbyTask will not consume any records. If there is no subsequent 
> reassignment before the second execution of the stateDirCleaner Thread, then 
> the task directory for the StandbyTask will be deleted. When the next 
> reassignment occurs the offset that was read by the StandbyTask at creation 
> time before acquiring the lock will be written back to the state store 
> directory, this re-creates the state store directory.
> An example:
> StreamThread(A) and StreamThread(B) are running on the same JVM in the same 
> streams application.
> StreamThread(A) has StandbyTask 1_0
> StreamThread(B) has no tasks
> A reassignment is triggered by another host in the streams application fleet.
> StreamThread(A) is notified with a PARTITIONS_REVOKED event of the threads 
> one task
> StreamThread(B) is notified with a PARTITIONS_ASSIGNED event of a standby 
> task for 1_0
> Here begins the race condition.
> StreamThread(B) creates the StandbyTask which reads the current checkpoint 
> from disk.
> StreamThread(B) then attempts to updateNewAndRestoringTasks() for it's 
> assigned tasks. [0]
> StreamThread(B) initializes the new tasks for the active and standby tasks. 
> [1] [2]
> StreamThread(B) attempts to lock the state directory for task 1_0 but fails 
> with a LockException [3], since StreamThread(A) still holds the lock.
> StreamThread(B) returns true from updateNewAndRestoringTasks() due to the 
> check at [4] which only checks that the active assigned tasks are running.
> StreamThread(B) state is set to RUNNING
> StreamThread(A) closes the previous StandbyTask specifically calling 
> closeStateManager() [5]
> StreamThread(A) state is set to RUNNING
> Streams application for this host has completed re-balancing and is now in 
> the RUNNING state.
> State at this point is the following: State directory exists for 1_0 and all 
> data is present.
> Then at a period that is 1 to 2 intervals of [6](which is default of 10 
> minutes) after the reassignment had completed the stateDirCleaner thread will 
> execute [7].
> The stateDirCleaner will then do [8], which finds the directory 1_0, finds 
> that there isn't an active lock for that directory, acquire the lock, and 
> deletes the directory.
> State at this point is the following: State directory does not exist for 1_0.
> When the next reassignment occurs. The offset that was read by 
> StreamThread(B) during construction of the StandbyTask for 1_0 will be 
> written back to disk. This write re-creates the state store directory and 
> writes the .checkpoint file with the old offset.
> State at this point is the following: State directory exists for 1_0 with a 
> '.checkpoint' file in it, but there is no other state store data in the 
> directory.
> If this host is assigned the active task for 1_0 then all the history in the 
> state store will be missing from before the offset that was read at the 
> previous reassignment. 
> If this host is assigned the standby task for 1_0 then the lock will be 
> acquired and the standby will start to consume records, but it will still be 
> missing all records from before the offset that was read at the previous 
> reassignment.
> If this host is not assigned 1_0, then the state directory will get cleaned 
> up by the stateDirCleaner thread 10 to 20 minutes later and the record loss 
> issue will be hidden.
> [0] 
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L865-L869
> [1] 
> 

[jira] [Commented] (KAFKA-6951) Implement offset expiration semantics for unsubscribed topics

2019-04-17 Thread Emmanuel Brard (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820132#comment-16820132
 ] 

Emmanuel Brard commented on KAFKA-6951:
---

[~wushujames] it's not implemented yet:


{code:java}
root@kafka-1:/# kafka-consumer-groups --bootstrap-server localhost:9092 
--describe --group my_favorite_group

TOPIC   PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG 
CONSUMER-ID HOST
CLIENT-ID
bar 0  58  58  0   
kafka-python-1.4.6-5f66aa52-8497-404c-9c3e-69f36139b30d /172.19.0.5 
kafka-python-1.4.6
foo 0  56  58  2   -
   -   -
{code}

{code:java}
root@kafka-1:/# kafka-consumer-groups --bootstrap-server localhost:9091 --group 
my_favorite_group --topic foo --delete
The consumer does not support topic-specific offset deletion from a consumer 
group.
{code}

See 
https://github.com/apache/kafka/blob/47a9871ef65a13f9d58d5ea216de340f7e123da5/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala#L941


> Implement offset expiration semantics for unsubscribed topics
> -
>
> Key: KAFKA-6951
> URL: https://issues.apache.org/jira/browse/KAFKA-6951
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>
> [This 
> portion|https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets#KIP-211:ReviseExpirationSemanticsofConsumerGroupOffsets-UnsubscribingfromaTopic]
>  of KIP-211 will be implemented separately from the main PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-5609) Connect log4j should log to file by default

2019-04-17 Thread Kaufman Ng (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaufman Ng reassigned KAFKA-5609:
-

Assignee: Kaufman Ng

> Connect log4j should log to file by default
> ---
>
> Key: KAFKA-5609
> URL: https://issues.apache.org/jira/browse/KAFKA-5609
> Project: Kafka
>  Issue Type: Improvement
>  Components: config, KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Yeva Byzek
>Assignee: Kaufman Ng
>Priority: Minor
>  Labels: easyfix
>
> {{https://github.com/apache/kafka/blob/trunk/config/connect-log4j.properties}}
> Currently logs to stdout.  It should also log to a file by default, otherwise 
> it just writes to console and messages can be lost



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5998) /.checkpoint.tmp Not found exception

2019-04-17 Thread Rabih Hallage (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819913#comment-16819913
 ] 

Rabih Hallage commented on KAFKA-5998:
--

Hello Guys,
 I'm experiencing the same issue with the following stack trace (I tried to 
change the state directory, same thing):

 
{code:java}
[2019-04-16 16:42:13,459] WARN [-app-cid-1-StreamThread-1] task [0_0] 
Failed to write offset checkpoint file to 
[/opt/ial/kafka-streams/-app_v1/0_0/.checkpoint] 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)
java.io.FileNotFoundException: 
/opt/ial/kafka-streams/-app_v1/0_0/.checkpoint.tmp (No such file or 
directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:162)
at 
org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:79)
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:325)
at 
org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:474)
at 
org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:596)
at 
org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:567)
at 
org.apache.kafka.streams.processor.internals.AssignedTasks.suspendTasks(AssignedTasks.java:128)
at 
org.apache.kafka.streams.processor.internals.AssignedTasks.suspend(AssignedTasks.java:97)
at 
org.apache.kafka.streams.processor.internals.TaskManager.suspendTasksAndState(TaskManager.java:242)
at 
org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsRevoked(StreamThread.java:319)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:461)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:396)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java)

{code}

> /.checkpoint.tmp Not found exception
> 
>
> Key: KAFKA-5998
> URL: https://issues.apache.org/jira/browse/KAFKA-5998
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0, 0.11.0.1, 2.1.1
>Reporter: Yogesh BG
>Priority: Critical
> Attachments: 5998.v1.txt, 5998.v2.txt, Topology.txt, exc.txt, 
> props.txt, streams.txt
>
>
> I have one kafka broker and one kafka stream running... I am running its 
> since two days under load of around 2500 msgs per second.. On third day am 
> getting below exception for some of the partitions, I have 16 partitions only 
> 0_0 and 0_1 gives this error
> {{09:43:25.955 [ks_0_inst-StreamThread-6] WARN  
> o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint:
> java.io.FileNotFoundException: 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint.tmp (No such file or 
> directory)
> at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:221) 
> ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:171) 
> ~[na:1.7.0_111]
> at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:201)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:260)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:254)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:322)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:415)
>  

[jira] [Commented] (KAFKA-8246) refactor topic/group instance id validation condition

2019-04-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819899#comment-16819899
 ] 

Sönke Liebau commented on KAFKA-8246:
-

I believe that used to be a regular expression but was changed for better 
performance, see this [pull request|https://github.com/apache/kafka/pull/3234].

> refactor topic/group instance id validation condition
> -
>
> Key: KAFKA-8246
> URL: https://issues.apache.org/jira/browse/KAFKA-8246
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8110) Flaky Test DescribeConsumerGroupTest#testDescribeMembersWithConsumersWithoutAssignedPartitions

2019-04-17 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819784#comment-16819784
 ] 

Matthias J. Sax commented on KAFKA-8110:


Failed again: 
[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3807/testReport/junit/kafka.admin/DescribeConsumerGroupTest/testDescribeOffsetsWithConsumersWithoutAssignedPartitions/]

> Flaky Test 
> DescribeConsumerGroupTest#testDescribeMembersWithConsumersWithoutAssignedPartitions
> --
>
> Key: KAFKA-8110
> URL: https://issues.apache.org/jira/browse/KAFKA-8110
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/67/testReport/junit/kafka.admin/DescribeConsumerGroupTest/testDescribeMembersWithConsumersWithoutAssignedPartitions/]
> {quote}java.lang.AssertionError: Partition [__consumer_offsets,0] metadata 
> not propagated after 15000 ms at 
> kafka.utils.TestUtils$.fail(TestUtils.scala:381) at 
> kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:791) at 
> kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:880) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$3(TestUtils.scala:318) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$3$adapted(TestUtils.scala:317) at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at 
> scala.collection.immutable.Range.foreach(Range.scala:158) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> kafka.utils.TestUtils$.createTopic(TestUtils.scala:317) at 
> kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:375) at 
> kafka.admin.DescribeConsumerGroupTest.testDescribeMembersWithConsumersWithoutAssignedPartitions(DescribeConsumerGroupTest.scala:372){quote}
> STDOUT
> {quote}[2019-03-14 20:01:52,347] WARN Ignoring unexpected runtime exception 
> (org.apache.zookeeper.server.NIOServerCnxnFactory:236) 
> java.nio.channels.CancelledKeyException at 
> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at 
> sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205)
>  at java.lang.Thread.run(Thread.java:748) TOPIC PARTITION CURRENT-OFFSET 
> LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID foo 0 0 0 0 - - - TOPIC 
> PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID foo 0 
> 0 0 0 - - - COORDINATOR (ID) ASSIGNMENT-STRATEGY STATE #MEMBERS 
> localhost:44669 (0){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-04-17 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819781#comment-16819781
 ] 

Matthias J. Sax commented on KAFKA-7965:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/3807/testReport/junit/kafka.api/ConsumerBounceTest/testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup/]

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 1.1.1, 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: huxihx
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8032) Flaky Test UserQuotaTest#testQuotaOverrideDelete

2019-04-17 Thread Manikumar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819767#comment-16819767
 ] 

Manikumar commented on KAFKA-8032:
--

[~mjsax]  Sorry for the delay. Will take a look next week.

> Flaky Test UserQuotaTest#testQuotaOverrideDelete
> 
>
> Key: KAFKA-8032
> URL: https://issues.apache.org/jira/browse/KAFKA-8032
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/2830/testReport/junit/kafka.api/UserQuotaTest/testQuotaOverrideDelete/]
> {quote}java.lang.AssertionError: Client with id=QuotasTestProducer-1 should 
> have been throttled at org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> kafka.api.QuotaTestClients.verifyThrottleTimeMetric(BaseQuotaTest.scala:229) 
> at kafka.api.QuotaTestClients.verifyProduceThrottle(BaseQuotaTest.scala:215) 
> at 
> kafka.api.BaseQuotaTest.testQuotaOverrideDelete(BaseQuotaTest.scala:124){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)