[jira] [Commented] (KAFKA-7927) Read committed receives aborted events

2019-02-14 Thread huxihx (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769021#comment-16769021
 ] 

huxihx commented on KAFKA-7927:
---

[~gsomogyi] The console consumer you were using is the Scala consumer which 
does not support transaction, therefore it ignored the given isolation level. 
Could you retry the scenario using the new consumer? For example, issue  
"kafka-console-consumer --zookeeper 
gsomogyi-cdh5144-220cloudera2-1.gce.cloudera.com:9092 --topic src-topic 
--from-beginning --isolation-level read_committed" and resend the to-be-aborted 
messages to check if they are not visible to the consumer.

> Read committed receives aborted events
> --
>
> Key: KAFKA-7927
> URL: https://issues.apache.org/jira/browse/KAFKA-7927
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, core, producer 
>Affects Versions: 1.0.0
>Reporter: Gabor Somogyi
>Priority: Blocker
> Attachments: KafkaProducer.scala, consumer.log, producer.log.gz
>
>
> When a kafka client produces ~30k events and at the end it aborts the 
> transaction a consumer can read part of the aborted messages when 
> "isolation.level" set to "READ_COMMITTED".
> Kafka client version: 2.0.0
> Kafka broker version: 1.0.0
> Producer:
> {code:java}
> java -jar 
> kafka-producer/target/kafka-producer-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> gsomogyi-cdh5144-220cloudera2-1.gce.cloudera.com:9092 src-topic
> {code}
> See attached code.
> Consumer:
> {code:java}
> kafka-console-consumer --zookeeper localhost:2181 --topic src-topic 
> --from-beginning --isolation-level read_committed
> {code}
> Same behavior seen when re-implemented the consumer in scala.
> The whole application can be found here: 
> https://github.com/gaborgsomogyi/kafka-semantics-tester



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7880) KafkaConnect should standardize worker thread name

2019-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768990#comment-16768990
 ] 

ASF GitHub Bot commented on KAFKA-7880:
---

FrLy commented on pull request #6272: KAFKA-7880:Naming task thread name by 
task id
URL: https://github.com/apache/kafka/pull/6272
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KafkaConnect should standardize worker thread name
> --
>
> Key: KAFKA-7880
> URL: https://issues.apache.org/jira/browse/KAFKA-7880
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 2.1.0
>Reporter: YeLiang
>Assignee: YeLiang
>Priority: Minor
>
> KafkaConnect will create a WorkerTask for tasks assigned to it and then 
> submit tasks to a thread pool.
> However,the 
> [Worker|https://github.com/apache/kafka/blob/2.1.0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java]
>  class initializes its thread pool using a default ThreadFactory.So the 
> thread name will have a pattern pool-[0-9]\-thread\-[0-9].
> When we are running KafkaConnect and find that one of the task thread is 
> under high CPU usage, it is difficult for us to find out which task is under 
> high load becasue when we print out the stack of KafkaConnect, we can only 
> see a list of threads name pool-[0-9]\-thread\-[0-9] even if we can know the 
> exact pid of the high CPU usage thread
> If worker threads name will be named like connectorName-taskId, it will be 
> very helpful



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7880) KafkaConnect should standardize worker thread name

2019-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768984#comment-16768984
 ] 

ASF GitHub Bot commented on KAFKA-7880:
---

FrLy commented on pull request #6272: KAFKA-7880:Naming task thread name by 
task id
URL: https://github.com/apache/kafka/pull/6272
 
 
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KafkaConnect should standardize worker thread name
> --
>
> Key: KAFKA-7880
> URL: https://issues.apache.org/jira/browse/KAFKA-7880
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 2.1.0
>Reporter: YeLiang
>Assignee: YeLiang
>Priority: Minor
>
> KafkaConnect will create a WorkerTask for tasks assigned to it and then 
> submit tasks to a thread pool.
> However,the 
> [Worker|https://github.com/apache/kafka/blob/2.1.0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java]
>  class initializes its thread pool using a default ThreadFactory.So the 
> thread name will have a pattern pool-[0-9]\-thread\-[0-9].
> When we are running KafkaConnect and find that one of the task thread is 
> under high CPU usage, it is difficult for us to find out which task is under 
> high load becasue when we print out the stack of KafkaConnect, we can only 
> see a list of threads name pool-[0-9]\-thread\-[0-9] even if we can know the 
> exact pid of the high CPU usage thread
> If worker threads name will be named like connectorName-taskId, it will be 
> very helpful



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7921) Instable KafkaStreamsTest

2019-02-14 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768810#comment-16768810
 ] 

Matthias J. Sax commented on KAFKA-7921:


[~vvcephei] Test failed again. Can you have a look: 
[https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/15/tests]

 

> Instable KafkaStreamsTest
> -
>
> Key: KAFKA-7921
> URL: https://issues.apache.org/jira/browse/KAFKA-7921
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Matthias J. Sax
>Assignee: John Roesler
>Priority: Major
>
> {{KafkaStreamsTest}} failed multiple times, eg,
> {quote}java.lang.AssertionError: Condition not met within timeout 15000. 
> Streams never started.
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:365)
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:325)
> at 
> org.apache.kafka.streams.KafkaStreamsTest.shouldThrowOnCleanupWhileRunning(KafkaStreamsTest.java:556){quote}
> or
> {quote}java.lang.AssertionError: Condition not met within timeout 15000. 
> Streams never started.
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:365)
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:325)
> at 
> org.apache.kafka.streams.KafkaStreamsTest.testStateThreadClose(KafkaStreamsTest.java:255){quote}
>  
> The preserved logs are as follows:
> {quote}[2019-02-12 07:02:17,198] INFO Kafka version: 2.3.0-SNAPSHOT 
> (org.apache.kafka.common.utils.AppInfoParser:109)
> [2019-02-12 07:02:17,198] INFO Kafka commitId: 08036fa4b1e5b822 
> (org.apache.kafka.common.utils.AppInfoParser:110)
> [2019-02-12 07:02:17,199] INFO stream-client [clientId] State transition from 
> CREATED to REBALANCING (org.apache.kafka.streams.KafkaStreams:263)
> [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-238] 
> Starting (org.apache.kafka.streams.processor.internals.StreamThread:767)
> [2019-02-12 07:02:17,200] INFO stream-client [clientId] State transition from 
> REBALANCING to PENDING_SHUTDOWN (org.apache.kafka.streams.KafkaStreams:263)
> [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-239] 
> Starting (org.apache.kafka.streams.processor.internals.StreamThread:767)
> [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-238] 
> State transition from CREATED to STARTING 
> (org.apache.kafka.streams.processor.internals.StreamThread:214)
> [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-239] 
> State transition from CREATED to STARTING 
> (org.apache.kafka.streams.processor.internals.StreamThread:214)
> [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-238] 
> Informed to shut down 
> (org.apache.kafka.streams.processor.internals.StreamThread:1192)
> [2019-02-12 07:02:17,201] INFO stream-thread [clientId-StreamThread-238] 
> State transition from STARTING to PENDING_SHUTDOWN 
> (org.apache.kafka.streams.processor.internals.StreamThread:214)
> [2019-02-12 07:02:17,201] INFO stream-thread [clientId-StreamThread-239] 
> Informed to shut down 
> (org.apache.kafka.streams.processor.internals.StreamThread:1192)
> [2019-02-12 07:02:17,201] INFO stream-thread [clientId-StreamThread-239] 
> State transition from STARTING to PENDING_SHUTDOWN 
> (org.apache.kafka.streams.processor.internals.StreamThread:214)
> [2019-02-12 07:02:17,205] INFO Cluster ID: J8uJhiTKQx-Y_i9LzT0iLg 
> (org.apache.kafka.clients.Metadata:365)
> [2019-02-12 07:02:17,205] INFO Cluster ID: J8uJhiTKQx-Y_i9LzT0iLg 
> (org.apache.kafka.clients.Metadata:365)
> [2019-02-12 07:02:17,205] INFO [Consumer 
> clientId=clientId-StreamThread-238-consumer, groupId=appId] Discovered group 
> coordinator localhost:36122 (id: 2147483647 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:675)
> [2019-02-12 07:02:17,205] INFO [Consumer 
> clientId=clientId-StreamThread-239-consumer, groupId=appId] Discovered group 
> coordinator localhost:36122 (id: 2147483647 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:675)
> [2019-02-12 07:02:17,206] INFO [Consumer 
> clientId=clientId-StreamThread-238-consumer, groupId=appId] Revoking 
> previously assigned partitions [] 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:458)
> [2019-02-12 07:02:17,206] INFO [Consumer 
> clientId=clientId-StreamThread-239-consumer, groupId=appId] Revoking 
> previously assigned partitions [] 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:458)
> [2019-02-12 07:02:17,206] INFO [Consumer 
> clientId=clientId-StreamThread-238-consumer, groupId=appId] (Re-)joining 
> group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:491)
> [2019-02-12 07:02:17,206] INFO [Consumer 
> 

[jira] [Commented] (KAFKA-7864) AdminZkClient.validateTopicCreate() should validate that partitions are 0-based

2019-02-14 Thread Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768798#comment-16768798
 ] 

Ryan commented on KAFKA-7864:
-

Hi [~yunfu], I'm still working on it, the code change has been approved, but 
still need to fix some integration tests. Am hoping to finish this in a few 
days.

> AdminZkClient.validateTopicCreate() should validate that partitions are 
> 0-based
> ---
>
> Key: KAFKA-7864
> URL: https://issues.apache.org/jira/browse/KAFKA-7864
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Assignee: Ryan
>Priority: Major
>  Labels: newbie
>
> AdminZkClient.validateTopicCreate() currently doesn't validate that partition 
> ids in a topic are consecutive, starting from 0. The client code depends on 
> that. So, it would be useful to tighten up the check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6474) Rewrite test to use new public TopologyTestDriver

2019-02-14 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768792#comment-16768792
 ] 

Matthias J. Sax commented on KAFKA-6474:


[~h314to] Just merge your latest PR. How many more do we need to resolve this 
completely (not urgent, but I would like to resolve this eventually). Thanks a 
lot for all your work!

> Rewrite test to use new public TopologyTestDriver
> -
>
> Key: KAFKA-6474
> URL: https://issues.apache.org/jira/browse/KAFKA-6474
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Affects Versions: 1.1.0
>Reporter: Matthias J. Sax
>Assignee: Filipe Agapito
>Priority: Major
>  Labels: newbie
>
> With KIP-247 we added public TopologyTestDriver. We should rewrite out own 
> test to use this new test driver and remove the two classes 
> ProcessorTopoogyTestDriver and KStreamTestDriver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6474) Rewrite test to use new public TopologyTestDriver

2019-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768790#comment-16768790
 ] 

ASF GitHub Bot commented on KAFKA-6474:
---

mjsax commented on pull request #5433: KAFKA-6474: Rewrite tests to use new 
public TopologyTestDriver [part 4]
URL: https://github.com/apache/kafka/pull/5433
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Rewrite test to use new public TopologyTestDriver
> -
>
> Key: KAFKA-6474
> URL: https://issues.apache.org/jira/browse/KAFKA-6474
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Affects Versions: 1.1.0
>Reporter: Matthias J. Sax
>Assignee: Filipe Agapito
>Priority: Major
>  Labels: newbie
>
> With KIP-247 we added public TopologyTestDriver. We should rewrite out own 
> test to use this new test driver and remove the two classes 
> ProcessorTopoogyTestDriver and KStreamTestDriver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7864) AdminZkClient.validateTopicCreate() should validate that partitions are 0-based

2019-02-14 Thread YUN FU (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768785#comment-16768785
 ] 

YUN FU commented on KAFKA-7864:
---

[~ctchen] Hi Ryan, just wondering whether you are still working on this ticket. 
If not, I would be happy to take over and work on this. Thanks!

> AdminZkClient.validateTopicCreate() should validate that partitions are 
> 0-based
> ---
>
> Key: KAFKA-7864
> URL: https://issues.apache.org/jira/browse/KAFKA-7864
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Assignee: Ryan
>Priority: Major
>  Labels: newbie
>
> AdminZkClient.validateTopicCreate() currently doesn't validate that partition 
> ids in a topic are consecutive, starting from 0. The client code depends on 
> that. So, it would be useful to tighten up the check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7711) Add a bounded flush() API to Kafka Producer

2019-02-14 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7711:
--
Component/s: producer 

> Add a bounded flush()  API to Kafka Producer
> 
>
> Key: KAFKA-7711
> URL: https://issues.apache.org/jira/browse/KAFKA-7711
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: kun du
>Priority: Minor
>
> Currently the call to Producer.flush() can be hang there for indeterminate 
> time.
> It is a good idea to add a bounded flush() API and timeout if producer is 
> unable to flush all the batch records in a limited time. In this way the 
> caller of flush() has a chance to decide what to do next instead of just wait 
> forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7932) Streams needs to handle new Producer exceptions

2019-02-14 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7932:
---
Fix Version/s: 2.3.0

> Streams needs to handle new Producer exceptions
> ---
>
> Key: KAFKA-7932
> URL: https://issues.apache.org/jira/browse/KAFKA-7932
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Priority: Critical
> Fix For: 2.3.0
>
>
> Following on KAFKA-7763, Streams needs to handle the new behavior.
> See also https://github.com/apache/kafka/pull/6066
> Streams code (StreamTask.java) needs to be modified to handle the new 
> exception.
> Also, from another upstream change, `initTxn` can also throw TimeoutException 
> now: default `MAX_BLOCK_MS_CONFIG` in producer is 60 seconds, so I think just 
> wrapping it as StreamsException should be reasonable, similar to what we do 
> for `producer#send`'s TimeoutException 
> ([https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L220-L225]
>  ).
>  
> Note we need to handle in three functions: init/commit/abortTxn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7932) Streams needs to handle new Producer exceptions

2019-02-14 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7932:
---
Priority: Critical  (was: Major)

> Streams needs to handle new Producer exceptions
> ---
>
> Key: KAFKA-7932
> URL: https://issues.apache.org/jira/browse/KAFKA-7932
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Priority: Critical
>
> Following on KAFKA-7763, Streams needs to handle the new behavior.
> See also https://github.com/apache/kafka/pull/6066
> Streams code (StreamTask.java) needs to be modified to handle the new 
> exception.
> Also, from another upstream change, `initTxn` can also throw TimeoutException 
> now: default `MAX_BLOCK_MS_CONFIG` in producer is 60 seconds, so I think just 
> wrapping it as StreamsException should be reasonable, similar to what we do 
> for `producer#send`'s TimeoutException 
> ([https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L220-L225]
>  ).
>  
> Note we need to handle in three functions: init/commit/abortTxn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7932) Streams needs to handle new Producer exceptions

2019-02-14 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7932:
---
Affects Version/s: 2.2.0

> Streams needs to handle new Producer exceptions
> ---
>
> Key: KAFKA-7932
> URL: https://issues.apache.org/jira/browse/KAFKA-7932
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.2.0
>Reporter: John Roesler
>Priority: Critical
> Fix For: 2.3.0
>
>
> Following on KAFKA-7763, Streams needs to handle the new behavior.
> See also https://github.com/apache/kafka/pull/6066
> Streams code (StreamTask.java) needs to be modified to handle the new 
> exception.
> Also, from another upstream change, `initTxn` can also throw TimeoutException 
> now: default `MAX_BLOCK_MS_CONFIG` in producer is 60 seconds, so I think just 
> wrapping it as StreamsException should be reasonable, similar to what we do 
> for `producer#send`'s TimeoutException 
> ([https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L220-L225]
>  ).
>  
> Note we need to handle in three functions: init/commit/abortTxn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7670) Fix flaky test - KafkaAdminClientTest.testUnreachableBootstrapServer

2019-02-14 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-7670.

Resolution: Fixed

> Fix flaky test - KafkaAdminClientTest.testUnreachableBootstrapServer
> 
>
> Key: KAFKA-7670
> URL: https://issues.apache.org/jira/browse/KAFKA-7670
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Assignee: Stanislav Kozlovski
>Priority: Minor
>
> It fails around once every 30 runs locally with
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node 
> assignment.
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262)
> at 
> org.apache.kafka.clients.admin.KafkaAdminClientTest.testUnreachableBootstrapServer(KafkaAdminClientTest.java:277)
> at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
> for a node assignment.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7930) StreamsResetter makes "changelog" topic naming assumptions

2019-02-14 Thread Murad M (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768628#comment-16768628
 ] 

Murad M edited comment on KAFKA-7930 at 2/14/19 7:22 PM:
-

[~guozhang], definitely, that how it was done in the first place. Consider your 
18 partitions topics as input, put them in appropriate `\-\-input-topics` and 
`\-\-intermediate-topics`. Then run with `\-\-dry-run` and see that in first 
section every partition is reported to be reset to expected offset, and miss 
single line in the end stating that same one topic whose 18 partitions above 
was reported to be reset, that it will be deleted. As result loose 120GB~ topic 
which takes days to replay :)

Any way, at this point I would consider this as bug, because topic provided in 
`\-\-input-topics` or `\-\-intermediate-topics` arguments is reported to be 
reset AND reported to be deleted, then without `\-\-dry-run` (StreamsResetter 
has no `\-\-execute` argument like `kafka-consumer-group.sh` btw) same topic 
first being reset AND then deleted.

For sure it is mistake of user attention, but also definitely a misbehavior of 
tool. Let's say user saw that it will be reset AND deleted, but topic is not 
intended for deletion, because user knows that it is input topic and should be 
treated as such, then what?

One way as suggested by [~mjsax] would be to switch from StreamsResetter to 
`kafka-consumer-group.sh`, which is still unclear to me as per a), b) and c) 
above. Does it do exactly same thing? Is it the way to go in this situation? 
What if in the future, StreamsResetter will do also things which 
`kafka-consumer-group.sh` would not be aware of?

On the other hand, is there anywhere topic naming guide lines that say that 
user must not name their topics in some specific ways? Since it is so, there 
definitely can remain historical topics which initially were name generated by 
streams, but with time changed their intention, there is no tool to rename 
topic apart of copying one topic to another topic with customized 
`mirror-maker`, which is whole different story. 


was (Author: muradm):
[~guozhang], definitely, that how it was done in the first place. Consider your 
18 partitions topics as input, put them in appropriate `\-\-input-topics` and 
`\-\-intermediate-topics`. Then run with `\-\-dry-run` and see that in first 
section every partition is reported to be reset to expected offset, and miss 
single line in the end stating that same one topic whose 18 partitions above 
was reported to be reset, that it will be deleted. As result loose 120GB~ topic 
which takes days to replay :)

Any way, at this point I would consider this as bug, because topic provided in 
`\-\-input-topics` or `\-\-intermediate-topics` arguments is reported to be 
reset AND reported to be deleted, then without `\-\-dry-run` (StreamsResetter 
has no `\-\-execute` argument like `kafka-consumer-group.sh` btw) same topic 
first being reset AND then deleted.

For sure it is mistake of user attention, but also definitely a misbehavior of 
tool. Let's say user saw that it will be reset AND deleted, but topic is not 
intended for deletion, because user knows that it is input topic and should be 
treated as such, then what?

One way as suggested by [~mjsax] would be to switch from StreamsResetter to 
`kafka-consumer-group.sh`, which is still unclear to me as per a), b) and c) 
above. Does it do exactly same thing? Is it the way to go in this situation? 
What if in the future, StreamsResetter will do also things which 
`kafka-consumer-group.sh` would not be aware of?

On the other hand, is there anywhere topic naming guide lines that say that 
user must not name their topics in some specific ways?

> StreamsResetter makes "changelog" topic naming assumptions
> --
>
> Key: KAFKA-7930
> URL: https://issues.apache.org/jira/browse/KAFKA-7930
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Affects Versions: 2.1.0
>Reporter: Murad M
>Priority: Major
>  Labels: features, needs-kip, patch-available, usability
>
> StreamsResetter deletes the topics considered internal. Currently it just 
> checks the naming as per 
> [code|https://github.com/apache/kafka/blob/1aae604861068bb7337d4972c9dcc0c0a99c374d/core/src/main/scala/kafka/tools/StreamsResetter.java#L660].
>  If assumption is wrong (either topic prefix or suffix), tool becomes useless 
> if aware even dangerous if not. Probably better either:
>  * naming assumption should be optional and supply internal topics with 
> argument (--internal-topics)
>  * deletion could be optional (--no-delete-internal)
>  * ignore topics which are included in list of --input-topics
> Faced this, when was trying to reset applications with GlobalKTable topics 
> 

[jira] [Comment Edited] (KAFKA-7930) StreamsResetter makes "changelog" topic naming assumptions

2019-02-14 Thread Murad M (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768628#comment-16768628
 ] 

Murad M edited comment on KAFKA-7930 at 2/14/19 7:20 PM:
-

[~guozhang], definitely, that how it was done in the first place. Consider your 
18 partitions topics as input, put them in appropriate `\-\-input-topics` and 
`\-\-intermediate-topics`. Then run with `\-\-dry-run` and see that in first 
section every partition is reported to be reset to expected offset, and miss 
single line in the end stating that same one topic whose 18 partitions above 
was reported to be reset, that it will be deleted. As result loose 120GB~ topic 
which takes days to replay :)

Any way, at this point I would consider this as bug, because topic provided in 
`\-\-input-topics` or `\-\-intermediate-topics` arguments is reported to be 
reset AND reported to be deleted, then without `\-\-dry-run` (StreamsResetter 
has no `\-\-execute` argument like `kafka-consumer-group.sh` btw) same topic 
first being reset AND then deleted.

For sure it is mistake of user attention, but also definitely a misbehavior of 
tool. Let's say user saw that it will be reset AND deleted, but topic is not 
intended for deletion, because user knows that it is input topic and should be 
treated as such, then what?

One way as suggested by [~mjsax] would be to switch from StreamsResetter to 
`kafka-consumer-group.sh`, which is still unclear to me as per a), b) and c) 
above. Does it do exactly same thing? Is it the way to go in this situation? 
What if in the future, StreamsResetter will do also things which 
`kafka-consumer-group.sh` would not be aware of?

On the other hand, is there anywhere topic naming guide lines that say that 
user must not name their topics in some specific ways?


was (Author: muradm):
[~guozhang], definitely, that how it was done in the first place. Consider your 
18 partitions topics as input, put them in appropriate `\-\-input-topics` and 
`\-\-intermediate-topics`. Then run with `\-\-dry-run` and see that in first 
section every partition is reported to be reset to expected offset, and miss 
single line in the end stating that same one topic whose 18 partitions above 
was reported to be reset, that it will be deleted. As result loose 120GB~ topic 
which takes days to replay :)

Any way, at this point I would consider this as bug, because topic provided in 
`\-\-input-topics` or `\-\-intermediate-topics` arguments is reported to be 
reset AND reported to be deleted, then without `\-\-dry-run` (StreamsResetter 
has no `\-\-execute` argument like `kafka-consumer-group.sh` btw) same topic 
first being reset AND then deleted.

For sure it is mistake of user attention, but also definitely a misbehavior of 
tool. Let's say user saw that it will be reset AND deleted, but topic is not 
intended for deletion, because user knows that it is input topic and should be 
treated as such, then what?

One way as suggested by [~mjsax] would be to switch from StreamsResetter to 
`kafka-consumer-group.sh`, which is still unclear to me as per a), b) and c) 
above. Does it do exactly same thing? Is it the way to go in this situation? 
What if in the future, StreamsResetter will do also things which 
`kafka-consumer-group.sh` would not be aware of?

> StreamsResetter makes "changelog" topic naming assumptions
> --
>
> Key: KAFKA-7930
> URL: https://issues.apache.org/jira/browse/KAFKA-7930
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Affects Versions: 2.1.0
>Reporter: Murad M
>Priority: Major
>  Labels: features, needs-kip, patch-available, usability
>
> StreamsResetter deletes the topics considered internal. Currently it just 
> checks the naming as per 
> [code|https://github.com/apache/kafka/blob/1aae604861068bb7337d4972c9dcc0c0a99c374d/core/src/main/scala/kafka/tools/StreamsResetter.java#L660].
>  If assumption is wrong (either topic prefix or suffix), tool becomes useless 
> if aware even dangerous if not. Probably better either:
>  * naming assumption should be optional and supply internal topics with 
> argument (--internal-topics)
>  * deletion could be optional (--no-delete-internal)
>  * ignore topics which are included in list of --input-topics
> Faced this, when was trying to reset applications with GlobalKTable topics 
> named as *-changelog. Such topics sometimes are not desirable for deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7932) Streams needs to handle new Producer exceptions

2019-02-14 Thread John Roesler (JIRA)
John Roesler created KAFKA-7932:
---

 Summary: Streams needs to handle new Producer exceptions
 Key: KAFKA-7932
 URL: https://issues.apache.org/jira/browse/KAFKA-7932
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: John Roesler


Following on KAFKA-7763, Streams needs to handle the new behavior.

See also https://github.com/apache/kafka/pull/6066

Streams code (StreamTask.java) needs to be modified to handle the new exception.

Also, from another upstream change, `initTxn` can also throw TimeoutException 
now: default `MAX_BLOCK_MS_CONFIG` in producer is 60 seconds, so I think just 
wrapping it as StreamsException should be reasonable, similar to what we do for 
`producer#send`'s TimeoutException 
([https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L220-L225]
 ).

 

Note we need to handle in three functions: init/commit/abortTxn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7930) StreamsResetter makes "changelog" topic naming assumptions

2019-02-14 Thread Murad M (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768628#comment-16768628
 ] 

Murad M commented on KAFKA-7930:


[~guozhang], definitely, that how it was done in the first place. Consider your 
18 partitions topics as input, put them in appropriate `\-\-input-topics` and 
`\-\-intermediate-topics`. Then run with `\-\-dry-run` and see that in first 
section every partition is reported to be reset to expected offset, and miss 
single line in the end stating that same one topic whose 18 partitions above 
was reported to be reset, that it will be deleted. As result loose 120GB~ topic 
which takes days to replay :)

Any way, at this point I would consider this as bug, because topic provided in 
`\-\-input-topics` or `\-\-intermediate-topics` arguments is reported to be 
reset AND reported to be deleted, then without `\-\-dry-run` (StreamsResetter 
has no `\-\-execute` argument like `kafka-consumer-group.sh` btw) same topic 
first being reset AND then deleted.

For sure it is mistake of user attention, but also definitely a misbehavior of 
tool. Let's say user saw that it will be reset AND deleted, but topic is not 
intended for deletion, because user knows that it is input topic and should be 
treated as such, then what?

One way as suggested by [~mjsax] would be to switch from StreamsResetter to 
`kafka-consumer-group.sh`, which is still unclear to me as per a), b) and c) 
above. Does it do exactly same thing? Is it the way to go in this situation? 
What if in the future, StreamsResetter will do also things which 
`kafka-consumer-group.sh` would not be aware of?

> StreamsResetter makes "changelog" topic naming assumptions
> --
>
> Key: KAFKA-7930
> URL: https://issues.apache.org/jira/browse/KAFKA-7930
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Affects Versions: 2.1.0
>Reporter: Murad M
>Priority: Major
>  Labels: features, needs-kip, patch-available, usability
>
> StreamsResetter deletes the topics considered internal. Currently it just 
> checks the naming as per 
> [code|https://github.com/apache/kafka/blob/1aae604861068bb7337d4972c9dcc0c0a99c374d/core/src/main/scala/kafka/tools/StreamsResetter.java#L660].
>  If assumption is wrong (either topic prefix or suffix), tool becomes useless 
> if aware even dangerous if not. Probably better either:
>  * naming assumption should be optional and supply internal topics with 
> argument (--internal-topics)
>  * deletion could be optional (--no-delete-internal)
>  * ignore topics which are included in list of --input-topics
> Faced this, when was trying to reset applications with GlobalKTable topics 
> named as *-changelog. Such topics sometimes are not desirable for deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7897) Invalid use of epoch cache with old message format versions

2019-02-14 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-7897:
---
Affects Version/s: (was: 2.1.0)

> Invalid use of epoch cache with old message format versions
> ---
>
> Key: KAFKA-7897
> URL: https://issues.apache.org/jira/browse/KAFKA-7897
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.1.1, 2.0.1
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 1.1.2, 2.2.0, 2.1.1, 2.0.2
>
>
> Message format downgrades are not supported, but they generally work as long 
> as broker/clients at least can continue to parse both message formats. After 
> a downgrade, the truncation logic should revert to using the high watermark, 
> but currently we use the existence of any cached epoch as the sole 
> prerequisite in order to leverage OffsetsForLeaderEpoch. This has the effect 
> of causing a massive truncation after startup which causes re-replication.
> I think our options to fix this are to either 1) clear the cache when we 
> notice a downgrade, or 2) forbid downgrades and raise an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7897) Invalid use of epoch cache with old message format versions

2019-02-14 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-7897:
---
Affects Version/s: (was: 1.1.1)
   2.1.0

> Invalid use of epoch cache with old message format versions
> ---
>
> Key: KAFKA-7897
> URL: https://issues.apache.org/jira/browse/KAFKA-7897
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 1.1.2, 2.2.0, 2.1.1, 2.0.2
>
>
> Message format downgrades are not supported, but they generally work as long 
> as broker/clients at least can continue to parse both message formats. After 
> a downgrade, the truncation logic should revert to using the high watermark, 
> but currently we use the existence of any cached epoch as the sole 
> prerequisite in order to leverage OffsetsForLeaderEpoch. This has the effect 
> of causing a massive truncation after startup which causes re-replication.
> I think our options to fix this are to either 1) clear the cache when we 
> notice a downgrade, or 2) forbid downgrades and raise an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7897) Invalid use of epoch cache with old message format versions

2019-02-14 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-7897:
---
Fix Version/s: (was: 1.1.2)

> Invalid use of epoch cache with old message format versions
> ---
>
> Key: KAFKA-7897
> URL: https://issues.apache.org/jira/browse/KAFKA-7897
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.2.0, 2.1.1, 2.0.2
>
>
> Message format downgrades are not supported, but they generally work as long 
> as broker/clients at least can continue to parse both message formats. After 
> a downgrade, the truncation logic should revert to using the high watermark, 
> but currently we use the existence of any cached epoch as the sole 
> prerequisite in order to leverage OffsetsForLeaderEpoch. This has the effect 
> of causing a massive truncation after startup which causes re-replication.
> I think our options to fix this are to either 1) clear the cache when we 
> notice a downgrade, or 2) forbid downgrades and raise an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7930) StreamsResetter makes "changelog" topic naming assumptions

2019-02-14 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768604#comment-16768604
 ] 

Guozhang Wang commented on KAFKA-7930:
--

[~muradm] StreamsResetter allows the user to choose between {{dry-run}} and 
{{execute}} so that they can first take a look at the expected effects before 
executing the irreversible operations, this is the recommended pattern that 
most Kafka tools allow users to do. Is that sufficient?

> StreamsResetter makes "changelog" topic naming assumptions
> --
>
> Key: KAFKA-7930
> URL: https://issues.apache.org/jira/browse/KAFKA-7930
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Affects Versions: 2.1.0
>Reporter: Murad M
>Priority: Major
>  Labels: features, needs-kip, patch-available, usability
>
> StreamsResetter deletes the topics considered internal. Currently it just 
> checks the naming as per 
> [code|https://github.com/apache/kafka/blob/1aae604861068bb7337d4972c9dcc0c0a99c374d/core/src/main/scala/kafka/tools/StreamsResetter.java#L660].
>  If assumption is wrong (either topic prefix or suffix), tool becomes useless 
> if aware even dangerous if not. Probably better either:
>  * naming assumption should be optional and supply internal topics with 
> argument (--internal-topics)
>  * deletion could be optional (--no-delete-internal)
>  * ignore topics which are included in list of --input-topics
> Faced this, when was trying to reset applications with GlobalKTable topics 
> named as *-changelog. Such topics sometimes are not desirable for deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7882) StateStores are frequently closed during the 'transform' method

2019-02-14 Thread Mateusz Owczarek (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768597#comment-16768597
 ] 

Mateusz Owczarek commented on KAFKA-7882:
-

I understand that, I'm familiar with the wall-clock vs event time basic 
concepts. I presume there may be latencies and wall-clock punctuation may not 
produce any events in a window. I'm accepting this risk. What I was asking is 
why it is not guaranteed that max one event per window is to be published while 
using wall-clock time on my suppress transformer.

This is crucial for me and 2.1.0 suppress implementation does not provide it, 
presumably, with emitEarlyWhenFull approach.

> StateStores are frequently closed during the 'transform' method
> ---
>
> Key: KAFKA-7882
> URL: https://issues.apache.org/jira/browse/KAFKA-7882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.0
>Reporter: Mateusz Owczarek
>Priority: Major
>
> Hello, I have a problem with the state store being closed frequently while 
> transforming upcoming records. To ensure only one record of the same key and 
> the window comes to an aggregate I have created a custom Transformer (I know 
> something similar is going to be introduced with suppress method on KTable in 
> the future, but my implementation is quite simple and imo should work 
> correctly) with the following implementation:
> {code:java}
> override def transform(key: Windowed[K], value: V): (Windowed[K], V) = {
> val partition = context.partition() 
> if (partition != -1) store.put(key.key(), (value, partition), 
> key.window().start()) 
> else logger.warn(s"-1 partition")
> null //Ensuring no 1:1 forwarding, context.forward and commit logic is in the 
> punctuator callback
> }
> {code}
>  
> What I do get is the following error:
> {code:java}
> Store MyStore is currently closed{code}
> This problem appears only when the number of streaming threads (or input 
> topic partitions) is greater than 1 even if I'm just saving to the store and 
> turn off the punctuation.
> If punctuation is present, however, I sometimes get -1 as a partition value 
> in the transform method. I'm familiar with the basic docs, however, I haven't 
> found anything that could help me here.
> I build my state store like this:
> {code:java}
> val stateStore = Stores.windowStoreBuilder(
>   Stores.persistentWindowStore(
> stateStoreName,
> timeWindows.maintainMs() + timeWindows.sizeMs + 
> TimeUnit.DAYS.toMillis(1),
> timeWindows.segments,
> timeWindows.sizeMs,
> false
>   ),
>   serde[K],
>   serde[(V, Int)]
> )
> {code}
> and include it in a DSL API like this:
> {code:java}
> builder.addStateStore(stateStore)
> (...).transform(new MyTransformer(...), "MyStore")
> {code}
> INB4: I don't close any state stores manually, I gave them retention time as 
> long as possible for the debugging stage, I tried to hotfix that with the 
> retry in the transform method and the state stores reopen at the end and the 
> `put` method works, but this approach is questionable and I am concerned if 
> it actually works.
> Edit:
> May this be because of the fact that the 
> {code:java}StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG{code} is set to low 
> value? If I understand correctly, spilling to disk is done therefore more 
> frequently, may it, therefore, cause closing the store?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7670) Fix flaky test - KafkaAdminClientTest.testUnreachableBootstrapServer

2019-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768583#comment-16768583
 ] 

ASF GitHub Bot commented on KAFKA-7670:
---

hachikuji commented on pull request #5942: KAFKA-7670: Admin client 
testUnreachableBootstrapServer() flaky test
URL: https://github.com/apache/kafka/pull/5942
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix flaky test - KafkaAdminClientTest.testUnreachableBootstrapServer
> 
>
> Key: KAFKA-7670
> URL: https://issues.apache.org/jira/browse/KAFKA-7670
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Assignee: Stanislav Kozlovski
>Priority: Minor
>
> It fails around once every 30 runs locally with
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node 
> assignment.
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
> at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262)
> at 
> org.apache.kafka.clients.admin.KafkaAdminClientTest.testUnreachableBootstrapServer(KafkaAdminClientTest.java:277)
> at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
> for a node assignment.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7916) Streams store cleanup: unify wrapping

2019-02-14 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7916:
---
Fix Version/s: 2.2.0

> Streams store cleanup: unify wrapping
> -
>
> Key: KAFKA-7916
> URL: https://issues.apache.org/jira/browse/KAFKA-7916
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Major
> Fix For: 2.2.0
>
>
> The internal store handling in Streams has become quite complex, with many 
> layers of wrapping for different bookeeping operations.
> The first thing we can do about this is to at least unify the wrapping 
> strategy, such that *all* store wrappers extend WrappedStateStore. This would 
> make the code easier to understand, since all wrappers would have the same 
> basic shape.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7811) Avoid unnecessary lock acquire when KafkaConsumer commits offsets

2019-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768570#comment-16768570
 ] 

ASF GitHub Bot commented on KAFKA-7811:
---

guozhangwang commented on pull request #6119: KAFKA-7811: Avoid unnecessary 
lock acquire when KafkaConsumer commits offsets
URL: https://github.com/apache/kafka/pull/6119
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Avoid unnecessary lock acquire when KafkaConsumer commits offsets
> -
>
> Key: KAFKA-7811
> URL: https://issues.apache.org/jira/browse/KAFKA-7811
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.2.2, 0.11.0.3, 1.0.2, 1.1.1, 2.0.1, 2.1.0
>Reporter: lambdaliu
>Assignee: lambdaliu
>Priority: Major
>
> In KafkaConsumer#commitAsync that does not take offset parameters, we have 
> the following logic:
> {code:java}
> public void commitAsync(OffsetCommitCallback callback) {
> acquireAndEnsureOpen();
> try {
> commitAsync(subscriptions.allConsumed(), callback);
> } finally {
> release();
> }
> }
> {code}
> This function calls another commitAsync with default all consumed offset 
> which also call `acquireAndEnsureOpen`.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7882) StateStores are frequently closed during the 'transform' method

2019-02-14 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768565#comment-16768565
 ] 

Matthias J. Sax commented on KAFKA-7882:


Because wall-clock time eviction does not know anything about event-time 
progress – event-time is embedded in the record payload while wall-clock time 
(as the name suggests) is based on system time. There is no relationship 
between them in general.

> StateStores are frequently closed during the 'transform' method
> ---
>
> Key: KAFKA-7882
> URL: https://issues.apache.org/jira/browse/KAFKA-7882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.0
>Reporter: Mateusz Owczarek
>Priority: Major
>
> Hello, I have a problem with the state store being closed frequently while 
> transforming upcoming records. To ensure only one record of the same key and 
> the window comes to an aggregate I have created a custom Transformer (I know 
> something similar is going to be introduced with suppress method on KTable in 
> the future, but my implementation is quite simple and imo should work 
> correctly) with the following implementation:
> {code:java}
> override def transform(key: Windowed[K], value: V): (Windowed[K], V) = {
> val partition = context.partition() 
> if (partition != -1) store.put(key.key(), (value, partition), 
> key.window().start()) 
> else logger.warn(s"-1 partition")
> null //Ensuring no 1:1 forwarding, context.forward and commit logic is in the 
> punctuator callback
> }
> {code}
>  
> What I do get is the following error:
> {code:java}
> Store MyStore is currently closed{code}
> This problem appears only when the number of streaming threads (or input 
> topic partitions) is greater than 1 even if I'm just saving to the store and 
> turn off the punctuation.
> If punctuation is present, however, I sometimes get -1 as a partition value 
> in the transform method. I'm familiar with the basic docs, however, I haven't 
> found anything that could help me here.
> I build my state store like this:
> {code:java}
> val stateStore = Stores.windowStoreBuilder(
>   Stores.persistentWindowStore(
> stateStoreName,
> timeWindows.maintainMs() + timeWindows.sizeMs + 
> TimeUnit.DAYS.toMillis(1),
> timeWindows.segments,
> timeWindows.sizeMs,
> false
>   ),
>   serde[K],
>   serde[(V, Int)]
> )
> {code}
> and include it in a DSL API like this:
> {code:java}
> builder.addStateStore(stateStore)
> (...).transform(new MyTransformer(...), "MyStore")
> {code}
> INB4: I don't close any state stores manually, I gave them retention time as 
> long as possible for the debugging stage, I tried to hotfix that with the 
> retry in the transform method and the state stores reopen at the end and the 
> `put` method works, but this approach is questionable and I am concerned if 
> it actually works.
> Edit:
> May this be because of the fact that the 
> {code:java}StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG{code} is set to low 
> value? If I understand correctly, spilling to disk is done therefore more 
> frequently, may it, therefore, cause closing the store?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7882) StateStores are frequently closed during the 'transform' method

2019-02-14 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7882.

Resolution: Duplicate

> StateStores are frequently closed during the 'transform' method
> ---
>
> Key: KAFKA-7882
> URL: https://issues.apache.org/jira/browse/KAFKA-7882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.0
>Reporter: Mateusz Owczarek
>Priority: Major
>
> Hello, I have a problem with the state store being closed frequently while 
> transforming upcoming records. To ensure only one record of the same key and 
> the window comes to an aggregate I have created a custom Transformer (I know 
> something similar is going to be introduced with suppress method on KTable in 
> the future, but my implementation is quite simple and imo should work 
> correctly) with the following implementation:
> {code:java}
> override def transform(key: Windowed[K], value: V): (Windowed[K], V) = {
> val partition = context.partition() 
> if (partition != -1) store.put(key.key(), (value, partition), 
> key.window().start()) 
> else logger.warn(s"-1 partition")
> null //Ensuring no 1:1 forwarding, context.forward and commit logic is in the 
> punctuator callback
> }
> {code}
>  
> What I do get is the following error:
> {code:java}
> Store MyStore is currently closed{code}
> This problem appears only when the number of streaming threads (or input 
> topic partitions) is greater than 1 even if I'm just saving to the store and 
> turn off the punctuation.
> If punctuation is present, however, I sometimes get -1 as a partition value 
> in the transform method. I'm familiar with the basic docs, however, I haven't 
> found anything that could help me here.
> I build my state store like this:
> {code:java}
> val stateStore = Stores.windowStoreBuilder(
>   Stores.persistentWindowStore(
> stateStoreName,
> timeWindows.maintainMs() + timeWindows.sizeMs + 
> TimeUnit.DAYS.toMillis(1),
> timeWindows.segments,
> timeWindows.sizeMs,
> false
>   ),
>   serde[K],
>   serde[(V, Int)]
> )
> {code}
> and include it in a DSL API like this:
> {code:java}
> builder.addStateStore(stateStore)
> (...).transform(new MyTransformer(...), "MyStore")
> {code}
> INB4: I don't close any state stores manually, I gave them retention time as 
> long as possible for the debugging stage, I tried to hotfix that with the 
> retry in the transform method and the state stores reopen at the end and the 
> `put` method works, but this approach is questionable and I am concerned if 
> it actually works.
> Edit:
> May this be because of the fact that the 
> {code:java}StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG{code} is set to low 
> value? If I understand correctly, spilling to disk is done therefore more 
> frequently, may it, therefore, cause closing the store?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7811) Avoid unnecessary lock acquire when KafkaConsumer commits offsets

2019-02-14 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-7811:
-
Description: 
In KafkaConsumer#commitAsync that does not take offset parameters, we have the 
following logic:
{code:java}
public void commitAsync(OffsetCommitCallback callback) {
acquireAndEnsureOpen();
try {
commitAsync(subscriptions.allConsumed(), callback);
} finally {
release();
}
}
{code}
This function calls another commitAsync with default all consumed offset which 
also call `acquireAndEnsureOpen`.

 

  was:
In KafkaConsumer#commitSync, we have the following logic:
{code:java}
public void commitAsync(OffsetCommitCallback callback) {
acquireAndEnsureOpen();
try {
commitAsync(subscriptions.allConsumed(), callback);
} finally {
release();
}
}
{code}
This function calls another commitAsync which also call `acquireAndEnsureOpen`.

 


> Avoid unnecessary lock acquire when KafkaConsumer commits offsets
> -
>
> Key: KAFKA-7811
> URL: https://issues.apache.org/jira/browse/KAFKA-7811
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.2.2, 0.11.0.3, 1.0.2, 1.1.1, 2.0.1, 2.1.0
>Reporter: lambdaliu
>Assignee: lambdaliu
>Priority: Major
>
> In KafkaConsumer#commitAsync that does not take offset parameters, we have 
> the following logic:
> {code:java}
> public void commitAsync(OffsetCommitCallback callback) {
> acquireAndEnsureOpen();
> try {
> commitAsync(subscriptions.allConsumed(), callback);
> } finally {
> release();
> }
> }
> {code}
> This function calls another commitAsync with default all consumed offset 
> which also call `acquireAndEnsureOpen`.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7916) Streams store cleanup: unify wrapping

2019-02-14 Thread John Roesler (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-7916.
-
Resolution: Fixed

> Streams store cleanup: unify wrapping
> -
>
> Key: KAFKA-7916
> URL: https://issues.apache.org/jira/browse/KAFKA-7916
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Major
>
> The internal store handling in Streams has become quite complex, with many 
> layers of wrapping for different bookeeping operations.
> The first thing we can do about this is to at least unify the wrapping 
> strategy, such that *all* store wrappers extend WrappedStateStore. This would 
> make the code easier to understand, since all wrappers would have the same 
> basic shape.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KAFKA-7916) Streams store cleanup: unify wrapping

2019-02-14 Thread John Roesler (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler closed KAFKA-7916.
---

> Streams store cleanup: unify wrapping
> -
>
> Key: KAFKA-7916
> URL: https://issues.apache.org/jira/browse/KAFKA-7916
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Major
>
> The internal store handling in Streams has become quite complex, with many 
> layers of wrapping for different bookeeping operations.
> The first thing we can do about this is to at least unify the wrapping 
> strategy, such that *all* store wrappers extend WrappedStateStore. This would 
> make the code easier to understand, since all wrappers would have the same 
> basic shape.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7916) Streams store cleanup: unify wrapping

2019-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768494#comment-16768494
 ] 

ASF GitHub Bot commented on KAFKA-7916:
---

guozhangwang commented on pull request #6255: KAFKA-7916: Unify store wrapping 
code for clarity
URL: https://github.com/apache/kafka/pull/6255
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Streams store cleanup: unify wrapping
> -
>
> Key: KAFKA-7916
> URL: https://issues.apache.org/jira/browse/KAFKA-7916
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Major
>
> The internal store handling in Streams has become quite complex, with many 
> layers of wrapping for different bookeeping operations.
> The first thing we can do about this is to at least unify the wrapping 
> strategy, such that *all* store wrappers extend WrappedStateStore. This would 
> make the code easier to understand, since all wrappers would have the same 
> basic shape.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-5722) Refactor ConfigCommand to use the AdminClient

2019-02-14 Thread Viktor Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi updated KAFKA-5722:
--
Fix Version/s: (was: 2.2.0)
   2.3.0

> Refactor ConfigCommand to use the AdminClient
> -
>
> Key: KAFKA-5722
> URL: https://issues.apache.org/jira/browse/KAFKA-5722
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Viktor Somogyi
>Assignee: Viktor Somogyi
>Priority: Major
>  Labels: kip, needs-kip
> Fix For: 2.3.0
>
>
> The ConfigCommand currently uses a direct connection to zookeeper. The 
> zookeeper dependency should be deprecated and an AdminClient API created to 
> be used instead.
> This change requires a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-5453) Controller may miss requests sent to the broker when zk session timeout happens.

2019-02-14 Thread Viktor Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi updated KAFKA-5453:
--
Fix Version/s: (was: 2.2.0)
   2.3.0

> Controller may miss requests sent to the broker when zk session timeout 
> happens.
> 
>
> Key: KAFKA-5453
> URL: https://issues.apache.org/jira/browse/KAFKA-5453
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.11.0.0
>Reporter: Jiangjie Qin
>Assignee: Viktor Somogyi
>Priority: Major
> Fix For: 2.3.0
>
>
> The issue I encountered was the following:
> 1. Partition reassignment was in progress, one replica of a partition is 
> being reassigned from broker 1 to broker 2.
> 2. Controller received an ISR change notification which indicates broker 2 
> has caught up.
> 3. Controller was sending StopReplicaRequest to broker 1.
> 4. Broker 1 zk session timeout occurs. Controller removed broker 1 from the 
> cluster and cleaned up the queue. i.e. the StopReplicaRequest was removed 
> from the ControllerChannelManager.
> 5. Broker 1 reconnected to zk and act as if it is still a follower replica of 
> the partition. 
> 6. Broker 1 will always receive exception from the leader because it is not 
> in the replica list.
> Not sure what is the correct fix here. It seems that broke 1 in this case 
> should ask the controller for the latest replica assignment.
> There are two related bugs:
> 1. when a {{NotAssignedReplicaException}} is thrown from 
> {{Partition.updateReplicaLogReadResult()}}, the other partitions in the same 
> request will failed to update the fetch timestamp and offset and thus also 
> drop out of the ISR.
> 2. The {{NotAssignedReplicaException}} was not properly returned to the 
> replicas, instead, a UnknownServerException is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7736) Consolidate Map usages in TransactionManager

2019-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768363#comment-16768363
 ] 

ASF GitHub Bot commented on KAFKA-7736:
---

viktorsomogyi commented on pull request #6270: KAFKA-7736: Consolidate Map 
usages in TransactionManager
URL: https://github.com/apache/kafka/pull/6270
 
 
   Refactors the various maps used in TransactionManager into one map to 
simplify bookkeeping of inflight batches, offsets and sequence numbers.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consolidate Map usages in TransactionManager
> 
>
> Key: KAFKA-7736
> URL: https://issues.apache.org/jira/browse/KAFKA-7736
> Project: Kafka
>  Issue Type: Task
>  Components: producer 
>Reporter: Viktor Somogyi
>Assignee: Viktor Somogyi
>Priority: Minor
>  Labels: exactly-once
>
> There are a bunch of Map collections in TransactionManager 
> which could be consolidated into a single map to consolidate bookkeeping and 
> get rid of potential bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-02-14 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768321#comment-16768321
 ] 

Abhi commented on KAFKA-7925:
-

I keep seeing below exception in all server logs.  No clients are able to 
connect to kafka brokers and keep timing out. Could anyone please help with 
this issue or provide a workaround to avoid this?


java.net.SocketTimeoutException: Failed to connect within 3 ms
at 
kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:93)
at 
kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
at 
kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241)
at 
kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130)
at 
kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:257)
at 
kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2019-02-14 09:20:00,617] INFO [ReplicaFetcher replicaId=1, leaderId=6, 
fetcherId=0] Error sending fetch request (sessionId=841897464, epoch=INITIAL) 
to node 6: java.net.SocketTimeoutException: Failed to connect within 3 ms. 
(org.apache.kafka.clients.FetchSessionHandler)


> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
> Environment: Java 11, Kafka v2.1.0
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> 

[jira] [Commented] (KAFKA-7736) Consolidate Map usages in TransactionManager

2019-02-14 Thread Viktor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768285#comment-16768285
 ] 

Viktor Somogyi commented on KAFKA-7736:
---

[~hachikuji] Had a go on refactoring these map usages into a single map. My 
approach was to create a TopicPartition to TopicPartitionEntry map which 
contains the values of the old maps, that way we can treat these data together. 
Please have a look once you have some time.

> Consolidate Map usages in TransactionManager
> 
>
> Key: KAFKA-7736
> URL: https://issues.apache.org/jira/browse/KAFKA-7736
> Project: Kafka
>  Issue Type: Task
>  Components: producer 
>Reporter: Viktor Somogyi
>Assignee: Viktor Somogyi
>Priority: Minor
>  Labels: exactly-once
>
> There are a bunch of Map collections in TransactionManager 
> which could be consolidated into a single map to consolidate bookkeeping and 
> get rid of potential bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7882) StateStores are frequently closed during the 'transform' method

2019-02-14 Thread Mateusz Owczarek (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768064#comment-16768064
 ] 

Mateusz Owczarek commented on KAFKA-7882:
-

Yes, we can, but there is still one unanswered question (maybe not related to 
the subject of the issue, but forgive me, I'm very curious):
{quote}_Offhand, if you were previously using wall-clock time, you don't need 
semantically airtight suppression behavior, so, emitting early when the buffer 
fills up should be fine._
{quote}
_As far as I understand, emitEarlyWhenFull does not guarantee one event per 
window so it's not sufficient for me. Could you elaborate on why any wall-clock 
time implementation does not ensure it either?_

> StateStores are frequently closed during the 'transform' method
> ---
>
> Key: KAFKA-7882
> URL: https://issues.apache.org/jira/browse/KAFKA-7882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.0
>Reporter: Mateusz Owczarek
>Priority: Major
>
> Hello, I have a problem with the state store being closed frequently while 
> transforming upcoming records. To ensure only one record of the same key and 
> the window comes to an aggregate I have created a custom Transformer (I know 
> something similar is going to be introduced with suppress method on KTable in 
> the future, but my implementation is quite simple and imo should work 
> correctly) with the following implementation:
> {code:java}
> override def transform(key: Windowed[K], value: V): (Windowed[K], V) = {
> val partition = context.partition() 
> if (partition != -1) store.put(key.key(), (value, partition), 
> key.window().start()) 
> else logger.warn(s"-1 partition")
> null //Ensuring no 1:1 forwarding, context.forward and commit logic is in the 
> punctuator callback
> }
> {code}
>  
> What I do get is the following error:
> {code:java}
> Store MyStore is currently closed{code}
> This problem appears only when the number of streaming threads (or input 
> topic partitions) is greater than 1 even if I'm just saving to the store and 
> turn off the punctuation.
> If punctuation is present, however, I sometimes get -1 as a partition value 
> in the transform method. I'm familiar with the basic docs, however, I haven't 
> found anything that could help me here.
> I build my state store like this:
> {code:java}
> val stateStore = Stores.windowStoreBuilder(
>   Stores.persistentWindowStore(
> stateStoreName,
> timeWindows.maintainMs() + timeWindows.sizeMs + 
> TimeUnit.DAYS.toMillis(1),
> timeWindows.segments,
> timeWindows.sizeMs,
> false
>   ),
>   serde[K],
>   serde[(V, Int)]
> )
> {code}
> and include it in a DSL API like this:
> {code:java}
> builder.addStateStore(stateStore)
> (...).transform(new MyTransformer(...), "MyStore")
> {code}
> INB4: I don't close any state stores manually, I gave them retention time as 
> long as possible for the debugging stage, I tried to hotfix that with the 
> retry in the transform method and the state stores reopen at the end and the 
> `put` method works, but this approach is questionable and I am concerned if 
> it actually works.
> Edit:
> May this be because of the fact that the 
> {code:java}StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG{code} is set to low 
> value? If I understand correctly, spilling to disk is done therefore more 
> frequently, may it, therefore, cause closing the store?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7930) StreamsResetter makes "changelog" topic naming assumptions

2019-02-14 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767975#comment-16767975
 ] 

Matthias J. Sax commented on KAFKA-7930:


Fair enough. :)

> StreamsResetter makes "changelog" topic naming assumptions
> --
>
> Key: KAFKA-7930
> URL: https://issues.apache.org/jira/browse/KAFKA-7930
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Affects Versions: 2.1.0
>Reporter: Murad M
>Priority: Major
>  Labels: features, needs-kip, patch-available, usability
>
> StreamsResetter deletes the topics considered internal. Currently it just 
> checks the naming as per 
> [code|https://github.com/apache/kafka/blob/1aae604861068bb7337d4972c9dcc0c0a99c374d/core/src/main/scala/kafka/tools/StreamsResetter.java#L660].
>  If assumption is wrong (either topic prefix or suffix), tool becomes useless 
> if aware even dangerous if not. Probably better either:
>  * naming assumption should be optional and supply internal topics with 
> argument (--internal-topics)
>  * deletion could be optional (--no-delete-internal)
>  * ignore topics which are included in list of --input-topics
> Faced this, when was trying to reset applications with GlobalKTable topics 
> named as *-changelog. Such topics sometimes are not desirable for deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7928) Deprecate WindowStore.put(key, value)

2019-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767952#comment-16767952
 ] 

ASF GitHub Bot commented on KAFKA-7928:
---

ouertani commented on pull request #6268: KAFKA-7928
URL: https://github.com/apache/kafka/pull/6268
 
 
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [x] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Deprecate WindowStore.put(key, value)
> -
>
> Key: KAFKA-7928
> URL: https://issues.apache.org/jira/browse/KAFKA-7928
> Project: Kafka
>  Issue Type: Improvement
>Reporter: John Roesler
>Priority: Major
>  Labels: beginner, easy-fix, needs-kip, newbie
>
> Specifically, `org.apache.kafka.streams.state.WindowStore#put(K, V)`
> This method is strange... A window store needs to have a timestamp associated 
> with the key, so if you do a put without a timestamp, it's up to the store to 
> just make one up.
> Even the javadoc on the method recommends not to use it, due to this 
> confusing behavior.
> We should just deprecate it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)