[jira] [Commented] (KAFKA-8643) Incompatible MemberDescription constructor change

2019-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883526#comment-16883526
 ] 

ASF GitHub Bot commented on KAFKA-8643:
---

hachikuji commented on pull request #7060: KAFKA-8643: bring back public 
MemberDescription constructor
URL: https://github.com/apache/kafka/pull/7060
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incompatible MemberDescription constructor change
> -
>
> Key: KAFKA-8643
> URL: https://issues.apache.org/jira/browse/KAFKA-8643
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Accidentally deleted the existing public constructor interface in the 
> MemberDescription. Need to bring back the old constructors for compatibility.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8637) WriteBatch objects leak off-heap memory

2019-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883417#comment-16883417
 ] 

ASF GitHub Bot commented on KAFKA-8637:
---

ableegoldman commented on pull request #7077: KAFKA-8637: WriteBatch objects 
leak off-heap memory
URL: https://github.com/apache/kafka/pull/7077
 
 
   Cherry-pick to 2.1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> WriteBatch objects leak off-heap memory
> ---
>
> Key: KAFKA-8637
> URL: https://issues.apache.org/jira/browse/KAFKA-8637
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0, 2.2.0, 2.3.0
>Reporter: Sophie Blee-Goldman
>Priority: Blocker
> Fix For: 2.1.2, 2.2.2, 2.3.1
>
>
> In 2.1 we did some refactoring that led to the WriteBatch objects in 
> RocksDBSegmentedBytesStore#restoreAllInternal being created in a separate 
> method, rather than in a try-with-resources statement as used elsewhere. This 
> causes a memory leak as the WriteBatches are no longer closed automatically



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-5866) Let source/sink task to finish their job before exit

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883413#comment-16883413
 ] 

Konstantine Karantasis commented on KAFKA-5866:
---

Now that https://issues.apache.org/jira/browse/KAFKA-5505 maybe some motivation 
behind this ticket has been addressed. Besides that, allowing a task to delay 
shutting down after stop has been signaled might interfere with rebalancing. 

If there's no objection I'd suggest to close this ticket as "won't fix"

cc [~rhauch] [~ewencp]

> Let source/sink task to finish their job before exit
> 
>
> Key: KAFKA-5866
> URL: https://issues.apache.org/jira/browse/KAFKA-5866
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Oleg Kuznetsov
>Priority: Major
>
> My case is about reading files. When task stops to rebalance or for other 
> reason, I want let it to read file till the end at least.
> I found that flag 
> {code:java}
> WorkerTask#stopping
> {code}
>  is set to true and only then 
> {code:java}
> SourceTask.stop()
> {code}
>  is called. This stopping flag prevents WorkerSourceTask from further 
> ingestion (exit from 
> {code:java}
> while ( !isStopped()))
> {code}.
> Is it possible to let task to decide to work some more time and possibly 
> produce more records from the moment of stop() was called on rebalance?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Issue Comment Deleted] (KAFKA-5827) Allow configuring Kafka sink connectors to start processing records from the end of topics

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-5827:
--
Comment: was deleted

(was: cc [~rhauch] [~ewencp])

> Allow configuring Kafka sink connectors to start processing records from the 
> end of topics
> --
>
> Key: KAFKA-5827
> URL: https://issues.apache.org/jira/browse/KAFKA-5827
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Behrang Saeedzadeh
>Priority: Major
>
> As far as I can see, Kafka connectors start exporting data of a topic from 
> the beginning of its partitions. We have a topic that contains a few million 
> old records that we don't need but we would like to start exporting new 
> records that are added to the topic.
> Basically:
> * When the connector is started for the first time and it does not have a 
> current offset stored, it should start consuming data from the end of topic 
> partitions
> * When the connector is restarted and has a current offset for partitions 
> stored somewhere, it should start from those offsets



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Issue Comment Deleted] (KAFKA-5827) Allow configuring Kafka sink connectors to start processing records from the end of topics

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-5827:
--
Comment: was deleted

(was: Now that https://issues.apache.org/jira/browse/KAFKA-5505 maybe some 
motivation behind this ticket has been addressed. Besides that, allowing a task 
to delay shutting down after stop has been signaled might interfere with 
rebalancing. 

If there's no objection I'd suggest to close this ticket as "won't fix")

> Allow configuring Kafka sink connectors to start processing records from the 
> end of topics
> --
>
> Key: KAFKA-5827
> URL: https://issues.apache.org/jira/browse/KAFKA-5827
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Behrang Saeedzadeh
>Priority: Major
>
> As far as I can see, Kafka connectors start exporting data of a topic from 
> the beginning of its partitions. We have a topic that contains a few million 
> old records that we don't need but we would like to start exporting new 
> records that are added to the topic.
> Basically:
> * When the connector is started for the first time and it does not have a 
> current offset stored, it should start consuming data from the end of topic 
> partitions
> * When the connector is restarted and has a current offset for partitions 
> stored somewhere, it should start from those offsets



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-5827) Allow configuring Kafka sink connectors to start processing records from the end of topics

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883410#comment-16883410
 ] 

Konstantine Karantasis commented on KAFKA-5827:
---

cc [~rhauch] [~ewencp]

> Allow configuring Kafka sink connectors to start processing records from the 
> end of topics
> --
>
> Key: KAFKA-5827
> URL: https://issues.apache.org/jira/browse/KAFKA-5827
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Behrang Saeedzadeh
>Priority: Major
>
> As far as I can see, Kafka connectors start exporting data of a topic from 
> the beginning of its partitions. We have a topic that contains a few million 
> old records that we don't need but we would like to start exporting new 
> records that are added to the topic.
> Basically:
> * When the connector is started for the first time and it does not have a 
> current offset stored, it should start consuming data from the end of topic 
> partitions
> * When the connector is restarted and has a current offset for partitions 
> stored somewhere, it should start from those offsets



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-5827) Allow configuring Kafka sink connectors to start processing records from the end of topics

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883409#comment-16883409
 ] 

Konstantine Karantasis commented on KAFKA-5827:
---

Now that https://issues.apache.org/jira/browse/KAFKA-5505 maybe some motivation 
behind this ticket has been addressed. Besides that, allowing a task to delay 
shutting down after stop has been signaled might interfere with rebalancing. 

If there's no objection I'd suggest to close this ticket as "won't fix"

> Allow configuring Kafka sink connectors to start processing records from the 
> end of topics
> --
>
> Key: KAFKA-5827
> URL: https://issues.apache.org/jira/browse/KAFKA-5827
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Behrang Saeedzadeh
>Priority: Major
>
> As far as I can see, Kafka connectors start exporting data of a topic from 
> the beginning of its partitions. We have a topic that contains a few million 
> old records that we don't need but we would like to start exporting new 
> records that are added to the topic.
> Basically:
> * When the connector is started for the first time and it does not have a 
> current offset stored, it should start consuming data from the end of topic 
> partitions
> * When the connector is restarted and has a current offset for partitions 
> stored somewhere, it should start from those offsets



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-5716) Connect: When SourceTask.commit it is possible not everthing from SourceTask.poll has been sent

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-5716:
--
Labels: needs-kip  (was: )

> Connect: When SourceTask.commit it is possible not everthing from 
> SourceTask.poll has been sent
> ---
>
> Key: KAFKA-5716
> URL: https://issues.apache.org/jira/browse/KAFKA-5716
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>Priority: Minor
>  Labels: needs-kip
> Attachments: KAFKA-5716.patch
>
>
> Not looking at the very latest code, so the "problem" may have been corrected 
> recently. If so, I apologize. I found the "problem" by code-inspection alone, 
> so I may be wrong. Have not had the time to write tests to confirm.
> According to java-doc on SourceTask.commit
> {quote}
> Commit the offsets, up to the offsets that have been returned by \{@link 
> #poll()}. This
> method should block until the commit is complete.
> SourceTasks are not required to implement this functionality; Kafka Connect 
> will record offsets
> automatically. This hook is provided for systems that also need to store 
> offsets internally
> in their own system.
> {quote}
> As I read this, when commit-method is called, the SourceTask-developer is 
> "told" that everything returned from poll up until "now" has been sent/stored 
> - both the outgoing messages and the associated connect-offsets. Looking at 
> the implementation it also seems that this is what it tries to 
> "guarantee/achieve".
> But as I see read the code, it is not necessarily true
> The following threads are involved
> * Task-thread: WorkerSourceTask has its own thread running 
> WorkerSourceTask.execute.
> * Committer-thread: From time to time SourceTaskOffsetCommitter is scheduled 
> to call WorkerSourceTask.commitOffsets (from a different thread)
> The two thread synchronize (on the WorkerSourceTask-object) in sendRecord and 
> commitOffsets respectively, hindering the task-thread to add to 
> outstandingMessages and offsetWriter while committer-thread is marking what 
> has to be flushed in the offsetWriter and waiting for outstandingMessages to 
> be empty. This means that the offsets committed will be consistent with what 
> has been sent out, but not necessarily what has been polled. At least I do 
> not see why the following is not possible:
> * Task-thread polls something from the task.poll
> * Before task-thread gets to add (all) the polled records to 
> outstandingMessages and offsetWriter in sendRecords, committer-thread kicks 
> in and does its commiting, while hindering the task-thread adding the polled 
> records to outstandingMessages and offsetWriter
> * Consistency will not have been compromised, but committer-thread will end 
> up calling task.commit (via WorkerSourceTask.commitSourceTask), without the 
> records just polled from task.poll has been sent or corresponding 
> connector-offsets flushed.
> If I am right, I guess there are two way to fix it
> * Either change the java-doc of SourceTask.commit, to something a-la (which I 
> do believe is true)
> {quote}
> Commit the offsets, up to the offsets that have been returned by \{@link 
> #poll()}
> *and confirmed by a call to \{@link #commitRecord(SourceRecord)}*.
> This method should block until the commit is complete.
> SourceTasks are not required to implement this functionality; Kafka Connect 
> will record offsets
> automatically. This hook is provided for systems that also need to store 
> offsets internally
> in their own system.
> {quote}
> * or, fix the "problem" so that it actually does what the java-doc says :-)
> If I am not right, of course I apologize for the inconvenience. I would 
> appreciate an explanation where my code-inspection is not correct, and why it 
> works even though I cannot see it. I will not expect such an explanation, 
> though.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-4793) Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed tasks

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-4793:
--
Labels: needs-kip  (was: )

> Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed 
> tasks
> -
>
> Key: KAFKA-4793
> URL: https://issues.apache.org/jira/browse/KAFKA-4793
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Priority: Major
>  Labels: needs-kip
>
> Sometimes tasks stop due to repeated failures. Users will want to restart the 
> connector and have it retry after fixing an issue. 
> We expected "POST /connectors/(string: name)/restart" to cause retry of 
> failed tasks, but this doesn't appear to be the case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-5740) Use separate file for HTTP logs

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883403#comment-16883403
 ] 

Konstantine Karantasis commented on KAFKA-5740:
---

Now that 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-449%3A+Add+connector+contexts+to+Connect+worker+logs
 has been merged and could help a lot classifying log messages per different 
context, should we close this ticket? 

> Use separate file for HTTP logs
> ---
>
> Key: KAFKA-5740
> URL: https://issues.apache.org/jira/browse/KAFKA-5740
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Priority: Critical
>  Labels: monitoring, usability
>
> Currently the HTTP logs are interspersed in the normal output. However, 
> client usage/requests should be logged to a separate file.
> Question: should the HTTP logs be *removed* from the normal Connect logs?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-5741) Prioritize threads in Connect distributed worker process

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883402#comment-16883402
 ] 

Konstantine Karantasis commented on KAFKA-5741:
---

[~rhauch] [~ewencp] do we think this is still important after 
https://issues.apache.org/jira/browse/KAFKA-5505 and 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
 ?

If not maybe we could close this ticket.

> Prioritize threads in Connect distributed worker process
> 
>
> Key: KAFKA-5741
> URL: https://issues.apache.org/jira/browse/KAFKA-5741
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Priority: Critical
>
> Connect's distributed worker process uses the {{DistributedHerder}} to 
> perform all administrative operations, including: starting, stopping, 
> pausing, resuming, reconfiguring connectors; rebalancing; etc. The 
> {{DistributedHerder}} uses a single threaded executor service to do all this 
> work and to do it sequentially. If this thread gets preempted for any reason 
> (e.g., connector tasks are bogging down the process, DoS, etc.), then the 
> herder's membership in the group may be dropped, causing a rebalance.
> This herder thread should be run at a much higher priority than all of the 
> other threads in the system.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-5696) SourceConnector does not commit offset on rebalance

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883401#comment-16883401
 ] 

Konstantine Karantasis commented on KAFKA-5696:
---

Now that https://issues.apache.org/jira/browse/KAFKA-5505 has been fixed, 
should we also close this ticket, since the impact is much lower now? 
Reconfiguration of a single connector won't stop other unaffected connectors or 
tasks. 

Thoughts?

> SourceConnector does not commit offset on rebalance
> ---
>
> Key: KAFKA-5696
> URL: https://issues.apache.org/jira/browse/KAFKA-5696
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Oleg Kuznetsov
>Priority: Critical
>  Labels: newbie
>
> I'm running SourceConnector, that reads files from storage and put data in 
> kafka. I want, in case of reconfiguration, offsets to be flushed. 
> Say, a file is completely processed, but source records are not yet committed 
> and in case of reconfiguration their offsets might be missing in store.
> Is it possible to force committing offsets on reconfiguration?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-5688) Add a modifier to the REST endpoint to only show errors

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-5688:
--
Labels: needs-kip  (was: )

> Add a modifier to the REST endpoint to only show errors
> ---
>
> Key: KAFKA-5688
> URL: https://issues.apache.org/jira/browse/KAFKA-5688
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Dustin Cote
>Priority: Minor
>  Labels: needs-kip
>
> Today the REST endpoint for workers to validate configuration is pretty hard 
> to read. It would be nice if we had a modifier for the /validate endpoint 
> that only showed configuration errors. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-5699) Validate and Create connector endpoint should take the same format message body

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-5699:
--
Labels: needs-kip  (was: )

> Validate and Create connector endpoint should take the same format message 
> body
> ---
>
> Key: KAFKA-5699
> URL: https://issues.apache.org/jira/browse/KAFKA-5699
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Robin Moffatt
>Priority: Minor
>  Labels: needs-kip
>
> It's a fairly ugly UX to want to 'do the right thing' and validate a 
> connector, but to have to do so with a different message body than that used 
> for a POST to /connectors. Can the format be standardised across the calls 
> (and for a PUT to //config too)?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-5635) KIP-181 Kafka-Connect integrate with kafka ReST Proxy

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis resolved KAFKA-5635.
---
Resolution: Won't Fix

> KIP-181 Kafka-Connect integrate with kafka ReST Proxy
> -
>
> Key: KAFKA-5635
> URL: https://issues.apache.org/jira/browse/KAFKA-5635
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Dhananjay Patkar
>Priority: Major
>  Labels: features, newbie
>
> Kafka connect currently uses kafka clients which directly connect to kafka 
> brokers. 
> In a use case, wherein I have many kafka connect [producers] running remotely 
> its a challenge to configure broker information on every connect agent.
> Also, in case of IP change [upgrade or cluster re-creation], we need to 
> update every remote connect configuration.
> If kafka connect source connectors talk to ReST endpoint then client is 
> unaware of broker details. This way we can transparently upgrade / re-create 
> kafka cluster as long as ReST endpoint remains same.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-5635) KIP-181 Kafka-Connect integrate with kafka ReST Proxy

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883397#comment-16883397
 ] 

Konstantine Karantasis commented on KAFKA-5635:
---

Closing given [~ewencp] reasoning and the fact that there hasn't been any 
discussion for over a year. 

But feel free to reopen if you think additional discussion is desirable. 

> KIP-181 Kafka-Connect integrate with kafka ReST Proxy
> -
>
> Key: KAFKA-5635
> URL: https://issues.apache.org/jira/browse/KAFKA-5635
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Dhananjay Patkar
>Priority: Major
>  Labels: features, newbie
>
> Kafka connect currently uses kafka clients which directly connect to kafka 
> brokers. 
> In a use case, wherein I have many kafka connect [producers] running remotely 
> its a challenge to configure broker information on every connect agent.
> Also, in case of IP change [upgrade or cluster re-creation], we need to 
> update every remote connect configuration.
> If kafka connect source connectors talk to ReST endpoint then client is 
> unaware of broker details. This way we can transparently upgrade / re-create 
> kafka cluster as long as ReST endpoint remains same.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-5538) User-specified Producer/Consumer config doesn't effect with KafkaBackingStore(Config/Status/Offset)

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883396#comment-16883396
 ] 

Konstantine Karantasis commented on KAFKA-5538:
---

It's been a while since this issue was discussed. In the meantime 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy
 was introduced. Hopefully this provides more flexibility and makes more clear 
why the KafkaBackingStore clients should be unaffected by those prefixed 
settings. 

I'd suggest closing this issue and its PR if there are not objections. 

> User-specified Producer/Consumer config doesn't effect with 
> KafkaBackingStore(Config/Status/Offset)
> ---
>
> Key: KAFKA-5538
> URL: https://issues.apache.org/jira/browse/KAFKA-5538
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Aegeaner
>Assignee: Aegeaner
>Priority: Major
>
> For configuration of Kafka source and Kafka sink tasks, the same parameters 
> can be used but need to be prefixed with consumer. and producer. The worker 
> will take off the prefix and get user specified configurations, but the 
> KafkaBackingStores will not. All the three KafkaBackingStores just took 
> originals from the Kafka config without taking off the prefix, so the 
> producer/consumer will ignore these configurations. (e.g. Kerberos configs)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-5538) User-specified Producer/Consumer config doesn't effect with KafkaBackingStore(Config/Status/Offset)

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883396#comment-16883396
 ] 

Konstantine Karantasis edited comment on KAFKA-5538 at 7/11/19 11:17 PM:
-

It's been a while since this issue was discussed. In the meantime 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy
 was introduced. Hopefully this provides more flexibility and makes more clear 
why the KafkaBackingStore clients should be unaffected by those prefixed 
settings. 

I'd suggest closing this issue and its PR if there are no objections. 


was (Author: kkonstantine):
It's been a while since this issue was discussed. In the meantime 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy
 was introduced. Hopefully this provides more flexibility and makes more clear 
why the KafkaBackingStore clients should be unaffected by those prefixed 
settings. 

I'd suggest closing this issue and its PR if there are not objections. 

> User-specified Producer/Consumer config doesn't effect with 
> KafkaBackingStore(Config/Status/Offset)
> ---
>
> Key: KAFKA-5538
> URL: https://issues.apache.org/jira/browse/KAFKA-5538
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Aegeaner
>Assignee: Aegeaner
>Priority: Major
>
> For configuration of Kafka source and Kafka sink tasks, the same parameters 
> can be used but need to be prefixed with consumer. and producer. The worker 
> will take off the prefix and get user specified configurations, but the 
> KafkaBackingStores will not. All the three KafkaBackingStores just took 
> originals from the Kafka config without taking off the prefix, so the 
> producer/consumer will ignore these configurations. (e.g. Kerberos configs)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8502) Flakey test AdminClientIntegrationTest#testElectUncleanLeadersForAllPartitions

2019-07-11 Thread Boyang Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883390#comment-16883390
 ] 

Boyang Chen commented on KAFKA-8502:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6175/consoleFull] 
failed again

 

> Flakey test AdminClientIntegrationTest#testElectUncleanLeadersForAllPartitions
> --
>
> Key: KAFKA-8502
> URL: https://issues.apache.org/jira/browse/KAFKA-8502
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Major
> Attachments: log-job6046.txt
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5355/consoleFull]
>  
> *18:06:01* *18:06:01* kafka.api.AdminClientIntegrationTest > 
> testElectUncleanLeadersForAllPartitions FAILED*18:06:01* 
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Aborted due to 
> timeout.*18:06:01* at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)*18:06:01*
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)*18:06:01*
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)*18:06:01*
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)*18:06:01*
>  at 
> kafka.api.AdminClientIntegrationTest.testElectUncleanLeadersForAllPartitions(AdminClientIntegrationTest.scala:1496)*18:06:01*
>  *18:06:01* Caused by:*18:06:01* 
> org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-8424) Replace ListGroups request/response with automated protocol

2019-07-11 Thread Boyang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-8424.

Resolution: Fixed

> Replace ListGroups request/response with automated protocol
> ---
>
> Key: KAFKA-8424
> URL: https://issues.apache.org/jira/browse/KAFKA-8424
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-4353) Add semantic types to Kafka Connect

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-4353:
--
Labels: needs-kip  (was: )

> Add semantic types to Kafka Connect
> ---
>
> Key: KAFKA-4353
> URL: https://issues.apache.org/jira/browse/KAFKA-4353
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.1
>Reporter: Randall Hauch
>Priority: Major
>  Labels: needs-kip
>
> Kafka Connect's schema system defines several _core types_ that consist of:
> * STRUCT
> * ARRAY
> * MAP
> plus these _primitive types_:
> * INT8
> * INT16
> * INT32
> * INT64
> * FLOAT32
> * FLOAT64
> * BOOLEAN
> * STRING
> * BYTES
> The {{Schema}} for these core types define several attributes, but they do 
> not have a name.
> Kafka Connect also defines several _logical types_ that are specializations 
> of the primitive types and _do_ have schema names _and_ are automatically 
> mapped to/from Java objects:
> || Schema Name || Primitive Type || Java value class || Description ||
> | o.k.c.d.Decimal | {{BYTES}} | {{java.math.BigDecimal}} | An 
> arbitrary-precision signed decimal number. |
> | o.k.c.d.Date | {{INT32}} | {{java.util.Date}} | A date representing a 
> calendar day with no time of day or timezone. The {{java.util.Date}} value's 
> hours, minutes, seconds, milliseconds are set to 0. The underlying 
> representation is an integer representing the number of standardized days 
> (based on a number of milliseconds with 24 hours/day, 60 minutes/hour, 60 
> seconds/minute, 1000 milliseconds/second with n) since Unix epoch. |
> | o.k.c.d.Time | {{INT32}} | {{java.util.Date}} | A time representing a 
> specific point in a day, not tied to any specific date. Only the 
> {{java.util.Date}} value's hours, minutes, seconds, and milliseconds can be 
> non-zero. This effectively makes it a point in time during the first day 
> after the Unix epoch. The underlying representation is an integer 
> representing the number of milliseconds after midnight. |
> | o.k.c.d.Timestamp | {{INT32}} | {{java.util.Date}} | A timestamp 
> representing an absolute time, without timezone information. The underlying 
> representation is a long representing the number of milliseconds since Unix 
> epoch. |
> where "o.k.c.d" is short for {{org.kafka.connect.data}}. [~ewencp] has stated 
> in the past that adding more logical types is challenging and generally 
> undesirable, since everyone use Kafka Connect values have to deal with all 
> new logical types.
> This proposal adds standard _semantic_ types that are somewhere between the 
> core types and logical types. Basically, they are just predefined schemas 
> that have names and are based on other primitive types. However, there is no 
> mapping to another form other than the primitive.
> The purpose of semantic types is to provide hints as to how the values _can_ 
> be treated. Of course, clients are free to ignore the hints of some or all of 
> the built-in semantic types, and in these cases would treat the values as the 
> primitive value with no extra semantics. This behavior makes it much easier 
> to add new semantic types over time without risking incompatibilities.
> Really, any source connector can define custom semantic types, but there is 
> tremendous value in having a library of standard, well-known semantic types, 
> including:
> || Schema Name || Primitive Type || Description ||
> | o.k.c.d.Uuid | {{STRING}} | A UUID in string form.|
> | o.k.c.d.Json | {{STRING}} | A JSON document, array, or scalar in string 
> form.|
> | o.k.c.d.Xml | {{STRING}} | An XML document in string form.|
> | o.k.c.d.BitSet | {{STRING}} | A string of zero or more {{0}} or {{1}} 
> characters.|
> | o.k.c.d.ZonedTime | {{STRING}} | An ISO-8601 formatted representation of a 
> time (with fractional seconds) with timezone or offset from UTC.|
> | o.k.c.d.ZonedTimestamp | {{STRING}} | An ISO-8601 formatted representation 
> of a timestamp with timezone or offset from UTC.|
> | o.k.c.d.EpochDays | {{INT64}} | A date with no time or timezone 
> information, represented as the number of days since (or before) epoch, or 
> January 1, 1970, at 00:00:00UTC.|
> | o.k.c.d.Year | {{INT32}} | The year number.|
> | o.k.c.d.MilliTime | {{INT32}} | Number of milliseconds past midnight.|
> | o.k.c.d.MicroTime | {{INT64}} | Number of microseconds past midnight.|
> | o.k.c.d.NanoTime | {{INT64}} | Number of nanoseconds past midnight.|
> | o.k.c.d.MilliTimestamp | {{INT64}} | Number of milliseconds past epoch.|
> | o.k.c.d.MicroTimestamp | {{INT64}} | Number of microseconds past epoch.|



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-4400) Prefix for sink task consumer groups should be configurable

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-4400:
--
Labels: needs-kip newbie  (was: newbie)

> Prefix for sink task consumer groups should be configurable
> ---
>
> Key: KAFKA-4400
> URL: https://issues.apache.org/jira/browse/KAFKA-4400
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Ewen Cheslack-Postava
>Priority: Major
>  Labels: needs-kip, newbie
>
> Currently the prefix for creating consumer groups is fixed. This means that 
> if you run multiple Connect clusters using the same Kafka cluster and create 
> connectors with the same name, sink tasks in different clusters will join the 
> same group. Making this prefix configurable at the worker level would protect 
> against this.
> An alternative would be to define unique cluster IDs for each connect 
> cluster, which would allow us to construct a unique name for the group 
> without requiring yet another config (but presents something of a 
> compatibility challenge).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-5000) Framework should log some progress information regularly to assist in troubleshooting

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883385#comment-16883385
 ] 

Konstantine Karantasis commented on KAFKA-5000:
---

Now that Connect metrics have been added, do we think this is still an issue?
[~rhauch] [~ewencp] thoughts?

> Framework should log some progress information regularly to assist in 
> troubleshooting
> -
>
> Key: KAFKA-5000
> URL: https://issues.apache.org/jira/browse/KAFKA-5000
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Priority: Major
>
> We get many questions of the type:
> "I started a connector, it doesn't seem to make any progress, I don't know 
> what to do"
> I think that periodic "progress reports" on the worker logs may help. 
> We have the offset commit message: "INFO 
> WorkerSinkTask{id=cassandra-sink-connector-0} Committing offsets 
> (org.apache.kafka.connect.runtime.WorkerSinkTask:244)"
> But I think we'd also want to know: topic, partition, offsets, how many rows 
> were read from source/kafka and how many were successfully written.
> This will help determine if there is any progress being made and whether some 
> partitions are "stuck" for some reason.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-4159) Allow overriding producer & consumer properties at the connector level

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883380#comment-16883380
 ] 

Konstantine Karantasis edited comment on KAFKA-4159 at 7/11/19 10:40 PM:
-

This functionality was discussed again in more recent tickets, which were also 
accompanied by KIPs. Specifically: 
 https://issues.apache.org/jira/browse/KAFKA-6890

and 
 https://issues.apache.org/jira/browse/KAFKA-8265

Given that the latter and most recent was approved and merged, I'd suggest at 
least closing this ticket here as superseded by 
https://issues.apache.org/jira/browse/KAFKA-8265 and 
[KIP-458|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy]]

[~rhauch] [~ewencp] thoughts?


was (Author: kkonstantine):
This functionality was discussed again in more recent tickets, which were also 
accompanied by KIPs. Specifically: 
https://issues.apache.org/jira/browse/KAFKA-6890


and 
https://issues.apache.org/jira/browse/KAFKA-8265

Given that the latter and most recent was approved and merged, I'd suggest at 
least closing this ticket here as superseded by 
https://issues.apache.org/jira/browse/KAFKA-8265 and 
[KIP-458|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy]].
 

[~rhauch] [~ewencp] thoughts?

> Allow overriding producer & consumer properties at the connector level
> --
>
> Key: KAFKA-4159
> URL: https://issues.apache.org/jira/browse/KAFKA-4159
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Stephen Durfey
>Priority: Major
>  Labels: needs-kip
>
> As an example use cases, overriding a sink connector's consumer's partition 
> assignment strategy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-4159) Allow overriding producer & consumer properties at the connector level

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883380#comment-16883380
 ] 

Konstantine Karantasis edited comment on KAFKA-4159 at 7/11/19 10:40 PM:
-

This functionality was discussed again in more recent tickets, which were also 
accompanied by KIPs. Specifically: 
 https://issues.apache.org/jira/browse/KAFKA-6890

and 
 https://issues.apache.org/jira/browse/KAFKA-8265

Given that the latter and most recent was approved and merged, I'd suggest at 
least closing this ticket here as superseded by 
https://issues.apache.org/jira/browse/KAFKA-8265 and 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy

[~rhauch] [~ewencp] thoughts?


was (Author: kkonstantine):
This functionality was discussed again in more recent tickets, which were also 
accompanied by KIPs. Specifically: 
 https://issues.apache.org/jira/browse/KAFKA-6890

and 
 https://issues.apache.org/jira/browse/KAFKA-8265

Given that the latter and most recent was approved and merged, I'd suggest at 
least closing this ticket here as superseded by 
https://issues.apache.org/jira/browse/KAFKA-8265 and 
[KIP-458|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy]]

[~rhauch] [~ewencp] thoughts?

> Allow overriding producer & consumer properties at the connector level
> --
>
> Key: KAFKA-4159
> URL: https://issues.apache.org/jira/browse/KAFKA-4159
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Stephen Durfey
>Priority: Major
>  Labels: needs-kip
>
> As an example use cases, overriding a sink connector's consumer's partition 
> assignment strategy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-4159) Allow overriding producer & consumer properties at the connector level

2019-07-11 Thread Konstantine Karantasis (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883380#comment-16883380
 ] 

Konstantine Karantasis commented on KAFKA-4159:
---

This functionality was discussed again in more recent tickets, which were also 
accompanied by KIPs. Specifically: 
https://issues.apache.org/jira/browse/KAFKA-6890


and 
https://issues.apache.org/jira/browse/KAFKA-8265

Given that the latter and most recent was approved and merged, I'd suggest at 
least closing this ticket here as superseded by 
https://issues.apache.org/jira/browse/KAFKA-8265 and 
[KIP-458|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy]].
 

[~rhauch] [~ewencp] thoughts?

> Allow overriding producer & consumer properties at the connector level
> --
>
> Key: KAFKA-4159
> URL: https://issues.apache.org/jira/browse/KAFKA-4159
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Stephen Durfey
>Priority: Major
>  Labels: needs-kip
>
> As an example use cases, overriding a sink connector's consumer's partition 
> assignment strategy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-4028) Add Connect cluster ID and expose it in REST API

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis updated KAFKA-4028:
--
Labels: needs-kip  (was: )

> Add Connect cluster ID and expose it in REST API
> 
>
> Key: KAFKA-4028
> URL: https://issues.apache.org/jira/browse/KAFKA-4028
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Ewen Cheslack-Postava
>Priority: Major
>  Labels: needs-kip
>
> We have some basic info about the server available via GET / (currently 
> version information). It'd be nice to have some additional cluster metadata 
> available via the REST API (perhaps under a /cluster endpoint). A cluster ID 
> would be a good starting point, although we'll need to decide whether we 
> really want this as a global view of the cluster or a set of APIs that give 
> you info about the individual worker (where some values should simply be 
> consistent across the cluster).
> There are a couple of ways we could implement cluster IDs:
> * An entirely new config
> * If we could get some unique ID for the Kafka cluster, leverage the name of 
> the config topic. This doesn't require a new worker config, but the name 
> probably isn't ideal -- it might include a reasonable prefix, but will also 
> often include the suffix "-config" which will look odd.
> * If we could get some unique ID for the Kafka cluster and implement 
> KAFKA-3254, we could automatically generate one as (Kafka cluster ID, topic 
> prefix)
> Note that some of these are assuming distributed mode. We'd have to figure 
> out a scheme that can also be applied to standalone clusters. Backwards 
> compatibility is also a concern since we'd rather not introduce any new 
> required configs if possible.
> As this is new public API, it'll need a KIP before implementation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-7688) Allow byte array class for Decimal Logical Types to fix Debezium Issues

2019-07-11 Thread Konstantine Karantasis (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis resolved KAFKA-7688.
---
   Resolution: Fixed
Fix Version/s: (was: 1.1.1)

This issue was resolved outside the Apache Kafka repo with this fix: 
[https://github.com/confluentinc/schema-registry/pull/1020

]Feel free to re-open if there's still missing functionality that belongs 
strictly to the Kafka Connect framework. 

> Allow byte array class for Decimal Logical Types to fix Debezium Issues
> ---
>
> Key: KAFKA-7688
> URL: https://issues.apache.org/jira/browse/KAFKA-7688
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.1.1
>Reporter: Eric C Abis
>Priority: Blocker
>
> Decimal Logical Type fields are failing with Kafka Connect sink tasks and 
> showing this error:
> {code:java}
> Invalid Java object for schema type BYTES: class [B for field: "null"{code}
> There is an issue tracker for the problem here in the Confluent Schema 
> Registry tracker (it's all related):  
> [https://github.com/confluentinc/schema-registry/issues/833]
> I've created a fix for this issue and tested and verified it in our CF4 
> cluster here at Shutterstock.
> The issue boils down to the fact that Avro Decimal Logical types store values 
> as a Byte Arrays. Debezium sets the Default Value as Base64 encoded Byte 
> Arrays and record values as Big Integer Byte Arrays.    I'd like to submit a 
> PR that changes the SCHEMA_TYPE_CLASSES hash map in 
> org.apache.kafka.connect.data.ConnectSchema to allow Byte Arrays for Decimal 
> fields. 
> I reached out [to 
> us...@kafka.apache.org|mailto:to%c2%a0us...@kafka.apache.org] to ask for 
> GitHub permissions but if there is somewhere else I need to reach out to 
> please let me know.
> My GitHub user is TheGreatAbyss
> Thank You!
> Eric



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8657) Automatic Topic Creation on Producer

2019-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883348#comment-16883348
 ] 

ASF GitHub Bot commented on KAFKA-8657:
---

jolshan commented on pull request #7075: KAFKA-8657: Automatic Topic Creation 
on Producer
URL: https://github.com/apache/kafka/pull/7075
 
 
   [WIP]
   
   This is a PR for KIP 487: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer
   
   To deprecate the broker configuration to automatically create topics, the 
producer now has configurations to automatically create topics. These include 
enabling auto-creation, the number of partitions and replication factor for 
automatically created topics. Tests have been added to show that the topic can 
be created automatically when the broker configuration is no longer enabled.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Automatic Topic Creation on Producer
> 
>
> Key: KAFKA-8657
> URL: https://issues.apache.org/jira/browse/KAFKA-8657
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Justine Olshan
>Assignee: Justine Olshan
>Priority: Minor
>
> Kafka has a feature that allows for the auto-creation of topics. Usually this 
> is done through a metadata request to the broker. KIP 487 aims to give the 
> producer the functionality to auto-create topics through a separate request. 
> See 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8634) Update ZooKeeper to 3.5.5

2019-07-11 Thread James Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883294#comment-16883294
 ] 

James Cheng commented on KAFKA-8634:


[~ijuma]: Prior to this PR, can I use Kafka to talk to Zookeeper instances 
running 3.5.5? (Meaning, I think, Kafka using Zookeeper 3.4.14 clients, talking 
to Zookeeper 3.5.5 server)

 

Or, is this PR required to even talk to a Zookeeper 3.5.5 server?

 

> Update ZooKeeper to 3.5.5
> -
>
> Key: KAFKA-8634
> URL: https://issues.apache.org/jira/browse/KAFKA-8634
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Major
> Fix For: 2.4.0
>
>
> ZooKeeper 3.5.5 is the first stable release in the 3.5.x series. The key new 
> feature in ZK 3.5.x is TLS support.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8648) Console tools should fail fast if an unrecognised --property is passed in

2019-07-11 Thread Stanislav Kozlovski (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883261#comment-16883261
 ] 

Stanislav Kozlovski commented on KAFKA-8648:


You might want to assign yourself to this, [~astubbs]

> Console tools should fail fast if an unrecognised --property is passed in
> -
>
> Key: KAFKA-8648
> URL: https://issues.apache.org/jira/browse/KAFKA-8648
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer, producer 
>Affects Versions: 2.3.0
>Reporter: Antony Stubbs
>Priority: Major
>  Labels: cli, console
>
> Some people don't realise that the --property for the console producer and 
> consumer, is for passing options to the formatter, not the actual producer or 
> consumer.
> If someone that doesn't know this, passes a property intending it to be a 
> consumer or producer property, they are passed to the formatter, but if it's 
> not a recognised option for the formatter, it fails silently and the user is 
> wondering what happened to their intended setting.
> The console tools otherwise fail on unrecognised arguments, it would be great 
> if the default formatter also failed with warning of the unrecognised option.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

2019-07-11 Thread Srikala (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883201#comment-16883201
 ] 

Srikala commented on KAFKA-7500:


[~ryannedolan]  translateOffsets() has worked for both failover and fallback 
scenarios. Thanks for the info.  Another request, For the  XDCR with 
Active/Active layout as per one of your slides, (attached here)  can you 
suggest any preferred way of  deploying MM2 clusters to support the 
replication. Thanks again. !Active-Active XDCR setup.png!

> MirrorMaker 2.0 (KIP-382)
> -
>
> Key: KAFKA-7500
> URL: https://issues.apache.org/jira/browse/KAFKA-7500
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Reporter: Ryanne Dolan
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: Active-Active XDCR setup.png
>
>
> Implement a drop-in replacement for MirrorMaker leveraging the Connect 
> framework.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
> [https://github.com/apache/kafka/pull/6295]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

2019-07-11 Thread Srikala (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikala updated KAFKA-7500:
---
Attachment: Active-Active XDCR setup.png

> MirrorMaker 2.0 (KIP-382)
> -
>
> Key: KAFKA-7500
> URL: https://issues.apache.org/jira/browse/KAFKA-7500
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Reporter: Ryanne Dolan
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: Active-Active XDCR setup.png
>
>
> Implement a drop-in replacement for MirrorMaker leveraging the Connect 
> framework.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
> [https://github.com/apache/kafka/pull/6295]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-8653) Regression in JoinGroup v0 rebalance timeout handling

2019-07-11 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-8653.

   Resolution: Fixed
Fix Version/s: 2.3.1

> Regression in JoinGroup v0 rebalance timeout handling
> -
>
> Key: KAFKA-8653
> URL: https://issues.apache.org/jira/browse/KAFKA-8653
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.3.1
>
>
> The rebalance timeout was added to the JoinGroup protocol in version 1. Prior 
> to 2.3, we handled version 0 JoinGroup requests by setting the rebalance 
> timeout to be equal to the session timeout. We lost this logic when we 
> converted the API to use the generated schema definition which uses the 
> default value of -1. The impact of this is that the group rebalance timeout 
> becomes 0, so rebalances finish immediately after we enter the 
> PrepareRebalance state and kick out all old members. This causes consumer 
> groups to enter an endless rebalance loop.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8657) Automatic Topic Creation on Producer

2019-07-11 Thread Justine Olshan (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan updated KAFKA-8657:
--
Description: 
Kafka has a feature that allows for the auto-creation of topics. Usually this 
is done through a metadata request to the broker. KIP 487 aims to give the 
producer the functionality to auto-create topics through a separate request. 

See 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer]
 

  was:
Kafka has a feature that allows for the auto-creation of topics. Usually this 
is done through a metadata request to the broker. KIP 487 aims to give the 
producer this functionality. 

See 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer]
 


> Automatic Topic Creation on Producer
> 
>
> Key: KAFKA-8657
> URL: https://issues.apache.org/jira/browse/KAFKA-8657
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Justine Olshan
>Assignee: Justine Olshan
>Priority: Minor
>
> Kafka has a feature that allows for the auto-creation of topics. Usually this 
> is done through a metadata request to the broker. KIP 487 aims to give the 
> producer the functionality to auto-create topics through a separate request. 
> See 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8657) Automatic Topic Creation on Producer

2019-07-11 Thread Justine Olshan (JIRA)
Justine Olshan created KAFKA-8657:
-

 Summary: Automatic Topic Creation on Producer
 Key: KAFKA-8657
 URL: https://issues.apache.org/jira/browse/KAFKA-8657
 Project: Kafka
  Issue Type: Improvement
Reporter: Justine Olshan
Assignee: Justine Olshan


Kafka has a feature that allows for the auto-creation of topics. Usually this 
is done through a metadata request to the broker. KIP 487 aims to give the 
producer this functionality. 

See 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer]
 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8644) The Kafka protocol generator should allow null defaults for bytes and array fields

2019-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883170#comment-16883170
 ] 

ASF GitHub Bot commented on KAFKA-8644:
---

gwenshap commented on pull request #7059: KAFKA-8644. The Kafka protocol 
generator should allow null defaults for bytes and array fields
URL: https://github.com/apache/kafka/pull/7059
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The Kafka protocol generator should allow null defaults for bytes and array 
> fields
> --
>
> Key: KAFKA-8644
> URL: https://issues.apache.org/jira/browse/KAFKA-8644
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>Priority: Major
> Fix For: 2.4.0
>
>
> The Kafka protocol generator should allow null defaults for bytes and array 
> fields.  Currently, null defaults are only allowed for string fields.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-8644) The Kafka protocol generator should allow null defaults for bytes and array fields

2019-07-11 Thread Gwen Shapira (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved KAFKA-8644.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

> The Kafka protocol generator should allow null defaults for bytes and array 
> fields
> --
>
> Key: KAFKA-8644
> URL: https://issues.apache.org/jira/browse/KAFKA-8644
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>Priority: Major
> Fix For: 2.4.0
>
>
> The Kafka protocol generator should allow null defaults for bytes and array 
> fields.  Currently, null defaults are only allowed for string fields.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8656) Kafka Consumer Record Latency Metric

2019-07-11 Thread Sean Glover (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Glover updated KAFKA-8656:
---
Description: 
Consumer lag is a useful metric to monitor how many records are queued to be 
processed.  We can look at individual lag per partition or we may aggregate 
metrics. For example, we may want to monitor what the maximum lag of any 
particular partition in our consumer subscription so we can identify hot 
partitions, caused by an insufficient producing partitioning strategy.  We may 
want to monitor a sum of lag across all partitions so we have a sense as to our 
total backlog of messages to consume. Lag in offsets is useful when you have a 
good understanding of your messages and processing characteristics, but it 
doesn’t tell us how far behind _in time_ we are.  This is known as wait time in 
queueing theory, or more informally it’s referred to as latency.

The latency of a message can be defined as the difference between when that 
message was first produced to when the message is received by a consumer.  The 
latency of records in a partition correlates with lag, but a larger lag doesn’t 
necessarily mean a larger latency. For example, a topic consumed by two 
separate application consumer groups A and B may have similar lag, but 
different latency per partition.  Application A is a consumer which performs 
CPU intensive business logic on each message it receives. It’s distributed 
across many consumer group members to handle the load quickly enough, but since 
its processing time is slower, it takes longer to process each message per 
partition.  Meanwhile, Application B is a consumer which performs a simple ETL 
operation to land streaming data in another system, such as HDFS. It may have 
similar lag to Application A, but because it has a faster processing time its 
latency per partition is significantly less.

If the Kafka Consumer reported a latency metric it would be easier to build 
Service Level Agreements (SLAs) based on non-functional requirements of the 
streaming system.  For example, the system must never have a latency of greater 
than 10 minutes. This SLA could be used in monitoring alerts or as input to 
automatic scaling solutions.

[KIP-488|https://cwiki.apache.org/confluence/display/KAFKA/488%3A+Kafka+Consumer+Record+Latency+Metric]

 

  was:
Consumer lag is a useful metric to monitor how many records are queued to be 
processed.  We can look at individual lag per partition or we may aggregate 
metrics. For example, we may want to monitor what the maximum lag of any 
particular partition in our consumer subscription so we can identify hot 
partitions, caused by an insufficient producing partitioning strategy.  We may 
want to monitor a sum of lag across all partitions so we have a sense as to our 
total backlog of messages to consume. Lag in offsets is useful when you have a 
good understanding of your messages and processing characteristics, but it 
doesn’t tell us how far behind _in time_ we are.  This is known as wait time in 
queueing theory, or more informally it’s referred to as latency.

The latency of a message can be defined as the difference between when that 
message was first produced to when the message is received by a consumer.  The 
latency of records in a partition correlates with lag, but a larger lag doesn’t 
necessarily mean a larger latency. For example, a topic consumed by two 
separate application consumer groups A and B may have similar lag, but 
different latency per partition.  Application A is a consumer which performs 
CPU intensive business logic on each message it receives. It’s distributed 
across many consumer group members to handle the load quickly enough, but since 
its processing time is slower, it takes longer to process each message per 
partition.  Meanwhile, Application B is a consumer which performs a simple ETL 
operation to land streaming data in another system, such as HDFS. It may have 
similar lag to Application A, but because it has a faster processing time its 
latency per partition is significantly less.

If the Kafka Consumer reported a latency metric it would be easier to build 
Service Level Agreements (SLAs) based on non-functional requirements of the 
streaming system.  For example, the system must never have a latency of greater 
than 10 minutes. This SLA could be used in monitoring alerts or as input to 
automatic scaling solutions.

[KIP-488|[https://cwiki.apache.org/confluence/display/KAFKA/488%3A+Kafka+Consumer+Record+Latency+Metric]]

 


> Kafka Consumer Record Latency Metric
> 
>
> Key: KAFKA-8656
> URL: https://issues.apache.org/jira/browse/KAFKA-8656
> Project: Kafka
>  Issue Type: New Feature
>  Components: metrics
>Reporter: Sean Glover
>Assignee: Sean Glover
>Priority: Major
>
> Cons

[jira] [Created] (KAFKA-8656) Kafka Consumer Record Latency Metric

2019-07-11 Thread Sean Glover (JIRA)
Sean Glover created KAFKA-8656:
--

 Summary: Kafka Consumer Record Latency Metric
 Key: KAFKA-8656
 URL: https://issues.apache.org/jira/browse/KAFKA-8656
 Project: Kafka
  Issue Type: New Feature
  Components: metrics
Reporter: Sean Glover
Assignee: Sean Glover


Consumer lag is a useful metric to monitor how many records are queued to be 
processed.  We can look at individual lag per partition or we may aggregate 
metrics. For example, we may want to monitor what the maximum lag of any 
particular partition in our consumer subscription so we can identify hot 
partitions, caused by an insufficient producing partitioning strategy.  We may 
want to monitor a sum of lag across all partitions so we have a sense as to our 
total backlog of messages to consume. Lag in offsets is useful when you have a 
good understanding of your messages and processing characteristics, but it 
doesn’t tell us how far behind _in time_ we are.  This is known as wait time in 
queueing theory, or more informally it’s referred to as latency.

The latency of a message can be defined as the difference between when that 
message was first produced to when the message is received by a consumer.  The 
latency of records in a partition correlates with lag, but a larger lag doesn’t 
necessarily mean a larger latency. For example, a topic consumed by two 
separate application consumer groups A and B may have similar lag, but 
different latency per partition.  Application A is a consumer which performs 
CPU intensive business logic on each message it receives. It’s distributed 
across many consumer group members to handle the load quickly enough, but since 
its processing time is slower, it takes longer to process each message per 
partition.  Meanwhile, Application B is a consumer which performs a simple ETL 
operation to land streaming data in another system, such as HDFS. It may have 
similar lag to Application A, but because it has a faster processing time its 
latency per partition is significantly less.

If the Kafka Consumer reported a latency metric it would be easier to build 
Service Level Agreements (SLAs) based on non-functional requirements of the 
streaming system.  For example, the system must never have a latency of greater 
than 10 minutes. This SLA could be used in monitoring alerts or as input to 
automatic scaling solutions.

[KIP-488|[https://cwiki.apache.org/confluence/display/KAFKA/488%3A+Kafka+Consumer+Record+Latency+Metric]]

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8629) Kafka Streams Apps to support small native images through GraalVM

2019-07-11 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883135#comment-16883135
 ] 

Matthias J. Sax commented on KAFKA-8629:


[~muirandy] – I added you to the list on contributors and assigned the ticket 
to you. You can now also self-assign tickets.

> Kafka Streams Apps to support small native images through GraalVM
> -
>
> Key: KAFKA-8629
> URL: https://issues.apache.org/jira/browse/KAFKA-8629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
> Environment: OSX
> Linux on Docker
>Reporter: Andy Muir
>Assignee: Andy Muir
>Priority: Minor
>
> I'm investigating using [GraalVM|http://example.com/] to help with reducing 
> docker image size and required resources for a simple Kafka Streams 
> microservice. To this end, I'm looking at running a microservice which:
> 1) consumes from a Kafka topic (XML)
> 2) Transforms into JSON
> 3) Produces to a new Kafka topic.
> The Kafka Streams app running in the JVM works fine.
> When I attempt to build it to a GraalVM native image (binary executable which 
> does not require the JVM, hence smaller image size and less resources), I 
> encountered a few 
> [incompatibilities|https://github.com/oracle/graal/blob/master/substratevm/LIMITATIONS.md]
>  with the source code in Kafka.
> I've implemented a workaround for each of these in a fork (link to follow) to 
> help establish if it is feasible. I don't intend (at this stage) for the 
> changes to be applied to the broker - I'm only after Kafka Streams for now. 
> I'm not sure whether it'd be a good idea for the broker itself to run as a 
> native image!
> There were 2 issues getting the native image with kafka streams:
> 1) Some Reflection use cases using MethodHandle
> 2) Anything JMX
> To work around these issues, I have:
> 1) Replaced use of MethodHandle with alternatives
> 2) Commented out the JMX code in a few places
> While the first may be sustainable, I'd expect that the 2nd option should be 
> put behind a configuration switch to allow the existing code to be used by 
> default and turning off JMX if configured.
> *I haven't created a PR for now, as I'd like feedback to decide if it is 
> going to be feasible to carry this forwards.*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (KAFKA-8629) Kafka Streams Apps to support small native images through GraalVM

2019-07-11 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-8629:
--

Assignee: Andy Muir

> Kafka Streams Apps to support small native images through GraalVM
> -
>
> Key: KAFKA-8629
> URL: https://issues.apache.org/jira/browse/KAFKA-8629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
> Environment: OSX
> Linux on Docker
>Reporter: Andy Muir
>Assignee: Andy Muir
>Priority: Minor
>
> I'm investigating using [GraalVM|http://example.com/] to help with reducing 
> docker image size and required resources for a simple Kafka Streams 
> microservice. To this end, I'm looking at running a microservice which:
> 1) consumes from a Kafka topic (XML)
> 2) Transforms into JSON
> 3) Produces to a new Kafka topic.
> The Kafka Streams app running in the JVM works fine.
> When I attempt to build it to a GraalVM native image (binary executable which 
> does not require the JVM, hence smaller image size and less resources), I 
> encountered a few 
> [incompatibilities|https://github.com/oracle/graal/blob/master/substratevm/LIMITATIONS.md]
>  with the source code in Kafka.
> I've implemented a workaround for each of these in a fork (link to follow) to 
> help establish if it is feasible. I don't intend (at this stage) for the 
> changes to be applied to the broker - I'm only after Kafka Streams for now. 
> I'm not sure whether it'd be a good idea for the broker itself to run as a 
> native image!
> There were 2 issues getting the native image with kafka streams:
> 1) Some Reflection use cases using MethodHandle
> 2) Anything JMX
> To work around these issues, I have:
> 1) Replaced use of MethodHandle with alternatives
> 2) Commented out the JMX code in a few places
> While the first may be sustainable, I'd expect that the 2nd option should be 
> put behind a configuration switch to allow the existing code to be used by 
> default and turning off JMX if configured.
> *I haven't created a PR for now, as I'd like feedback to decide if it is 
> going to be feasible to carry this forwards.*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-7862) Modify JoinGroup logic to incorporate group.instance.id change

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-7862:

Component/s: consumer

> Modify JoinGroup logic to incorporate group.instance.id change
> --
>
> Key: KAFKA-7862
> URL: https://issues.apache.org/jira/browse/KAFKA-7862
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.3.0
>
>
> The step one for KIP-345 join group logic change to corporate with static 
> membership.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8484) ProducerId reset can cause IllegalStateException

2019-07-11 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883132#comment-16883132
 ] 

Andrew Olson commented on KAFKA-8484:
-

https://github.com/apache/kafka/pull/6883

> ProducerId reset can cause IllegalStateException
> 
>
> Key: KAFKA-8484
> URL: https://issues.apache.org/jira/browse/KAFKA-8484
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.3.0
>
>
> If the producerId is reset while inflight requests are pending, we can get 
> the follow uncaught error.
> {code}
> [2019-06-03 08:20:45,320] ERROR [Producer clientId=producer-1] Uncaught error 
> in request completion: (org.apache.kafka.clients.NetworkClient)   
>   
>   
> java.lang.IllegalStateException: Sequence number for partition test_topic-13 
> is going to become negative : -965
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.adjustSequencesDueToFailedBatch(TransactionManager.java:561)
>   
> 
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:744)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)
> at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:667)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:574)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:75)
> at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:818)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
> at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:561)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:553)
> at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:253)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The impact of this is that a failed batch will not be completed until the 
> delivery timeout is exceeded. We are missing validation when we receive a 
> produce response that the producerId and epoch still match.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8484) ProducerId reset can cause IllegalStateException

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-8484:

Component/s: producer 

> ProducerId reset can cause IllegalStateException
> 
>
> Key: KAFKA-8484
> URL: https://issues.apache.org/jira/browse/KAFKA-8484
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.3.0
>
>
> If the producerId is reset while inflight requests are pending, we can get 
> the follow uncaught error.
> {code}
> [2019-06-03 08:20:45,320] ERROR [Producer clientId=producer-1] Uncaught error 
> in request completion: (org.apache.kafka.clients.NetworkClient)   
>   
>   
> java.lang.IllegalStateException: Sequence number for partition test_topic-13 
> is going to become negative : -965
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.adjustSequencesDueToFailedBatch(TransactionManager.java:561)
>   
> 
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:744)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)
> at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:667)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:574)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:75)
> at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:818)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
> at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:561)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:553)
> at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:253)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The impact of this is that a failed batch will not be completed until the 
> delivery timeout is exceeded. We are missing validation when we receive a 
> produce response that the producerId and epoch still match.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8463) Fix redundant reassignment of tasks when leader worker leaves

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-8463:

Component/s: KafkaConnect

> Fix redundant reassignment of tasks when leader worker leaves
> -
>
> Key: KAFKA-8463
> URL: https://issues.apache.org/jira/browse/KAFKA-8463
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.0
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Blocker
> Fix For: 2.3.0
>
>
> There's a bug in the computation of new assignment for connectors and tasks 
> in \{{IncrementalCooperativeAssignor}} that may lead to duplicate assignment. 
> The existing assignments should always be considered because if the leader 
> worker bounces there's no history of previous assignments. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8483) Possible reordering of messages by producer after UNKNOWN_PRODUCER_ID error

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-8483:

Component/s: producer 

> Possible reordering of messages by producer after UNKNOWN_PRODUCER_ID error
> ---
>
> Key: KAFKA-8483
> URL: https://issues.apache.org/jira/browse/KAFKA-8483
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.3.0
>
>
> The idempotent producer attempts to detect spurious UNKNOWN_PRODUCER_ID 
> errors and handle them by reassigning sequence numbers to the inflight 
> batches. The inflight batches are tracked in a PriorityQueue. The problem is 
> that the reassignment of sequence numbers depends on the iteration order of 
> PriorityQueue, which does not guarantee any ordering. So this can result in 
> sequence numbers being assigned in the wrong order.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8344) Fix vagrant-up.sh to work with AWS properly

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-8344:

Component/s: system tests

> Fix vagrant-up.sh to work with AWS properly
> ---
>
> Key: KAFKA-8344
> URL: https://issues.apache.org/jira/browse/KAFKA-8344
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 2.3.0
>
>
> I tried to run {{vagrant/vagrant-up.sh --aws}} with the following 
> Vagrantfile.local.
> {code}
> enable_dns = true
> enable_hostmanager = false
> # EC2
> ec2_access_key = ""
> ec2_secret_key = ""
> ec2_keypair_name = "keypair"
> ec2_keypair_file = "/path/to/keypair/file"
> ec2_region = "ap-northeast-1"
> ec2_ami = "ami-0905ffddadbfd01b7"
> ec2_security_groups = "sg-"
> ec2_subnet_id = "subnet-"
> {code}
> EC2 instances were successfully created, but it failed with the following 
> error after that.
> {code}
> $ vagrant/vagrant-up.sh --aws
> (snip)
> An active machine was found with a different provider. Vagrant
> currently allows each machine to be brought up with only a single
> provider at a time. A future version will remove this limitation.
> Until then, please destroy the existing machine to up with a new
> provider.
> Machine name: zk1
> Active provider: aws
> Requested provider: virtualbox
> {code}
> It seems that the {{vagrant hostmanager}} command also requires 
> {{--provider=aws}} option, in addition to {{vagrant up}}.
> With that option, it succeeded as follows:
> {code}
> $ git diff
> diff --git a/vagrant/vagrant-up.sh b/vagrant/vagrant-up.sh
> index 6a4ef9564..9210a5357 100755
> --- a/vagrant/vagrant-up.sh
> +++ b/vagrant/vagrant-up.sh
> @@ -220,7 +220,7 @@ function bring_up_aws {
>  # We still have to bring up zookeeper/broker nodes serially
>  echo "Bringing up zookeeper/broker machines serially"
>  vagrant up --provider=aws --no-parallel --no-provision 
> $zk_broker_machines $debug
> -vagrant hostmanager
> +vagrant hostmanager --provider=aws
>  vagrant provision
>  fi
> @@ -231,11 +231,11 @@ function bring_up_aws {
>  local vagrant_rsync_temp_dir=$(mktemp -d);
>  TMPDIR=$vagrant_rsync_temp_dir vagrant_batch_command "vagrant up 
> $debug --provider=aws" "$worker_machines" "$max_parallel"
>  rm -rf $vagrant_rsync_temp_dir
> -vagrant hostmanager
> +vagrant hostmanager --provider=aws
>  fi
>  else
>  vagrant up --provider=aws --no-parallel --no-provision $debug
> -vagrant hostmanager
> +vagrant hostmanager --provider=aws
>  vagrant provision
>  fi
> $ vagrant/vagrant-up.sh --aws
> (snip)
> ==> broker3: Running provisioner: shell...
> broker3: Running: /tmp/vagrant-shell20190509-25399-8f1wgz.sh
> broker3: Killing server
> broker3: No kafka server to stop
> broker3: Starting server
> $ vagrant status
> Current machine states:
> zk1   running (aws)
> broker1   running (aws)
> broker2   running (aws)
> broker3   running (aws)
> This environment represents multiple VMs. The VMs are all listed
> above with their current state. For more information about a specific
> VM, run `vagrant status NAME`.
> $ vagrant ssh broker1
> (snip)
> ubuntu@ip-172-16-0-62:~$ /opt/kafka-dev/bin/kafka-topics.sh 
> --bootstrap-server broker1:9092,broker2:9092,broker3:9092 --create 
> --partitions 1 --replication-factor 3 --topic sandbox
> (snip)
> ubuntu@ip-172-16-0-62:~$ /opt/kafka-dev/bin/kafka-topics.sh 
> --bootstrap-server broker1:9092,broker2:9092,broker3:9092 --list
> (snip)
> sandbox
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8324) User constructed RocksObjects leak memory

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-8324:

Component/s: streams

> User constructed RocksObjects leak memory
> -
>
> Key: KAFKA-8324
> URL: https://issues.apache.org/jira/browse/KAFKA-8324
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Sophie Blee-Goldman
>Assignee: Sophie Blee-Goldman
>Priority: Major
>  Labels: kip
> Fix For: 2.3.0
>
>
> Some of the RocksDB options a user can set when extending RocksDBConfigSetter 
> take Rocks objects as parameters. Many of these – including potentially large 
> objects like Cache and Filter – inherit from AbstractNativeReference and must 
> be closed explicitly in order to free the memory of the backing C++ object. 
> However the user has no way of closing any objects they have created in 
> RocksDBConfigSetter, and we do not ever close them for them. 
> KIP-453: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-453%3A+Add+close%28%29+method+to+RocksDBConfigSetter]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8275) NetworkClient leastLoadedNode selection should consider throttled nodes

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-8275:

Component/s: clients

> NetworkClient leastLoadedNode selection should consider throttled nodes
> ---
>
> Key: KAFKA-8275
> URL: https://issues.apache.org/jira/browse/KAFKA-8275
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.3.0
>
>
> The leastLoadedNode() function is intended to find any available node. It is 
> smart in the sense that it considers the number of inflight requests and 
> reconnect backoff, but it has not been updated to take into account client 
> throttling. If we have an available node which is not throttled, we should 
> use it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8275) NetworkClient leastLoadedNode selection should consider throttled nodes

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-8275:

Component/s: network

> NetworkClient leastLoadedNode selection should consider throttled nodes
> ---
>
> Key: KAFKA-8275
> URL: https://issues.apache.org/jira/browse/KAFKA-8275
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, network
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.3.0
>
>
> The leastLoadedNode() function is intended to find any available node. It is 
> smart in the sense that it considers the number of inflight requests and 
> reconnect backoff, but it has not been updated to take into account client 
> throttling. If we have an available node which is not throttled, we should 
> use it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-7974) KafkaAdminClient loses worker thread/enters zombie state when initial DNS lookup fails

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-7974:

Component/s: admin

> KafkaAdminClient loses worker thread/enters zombie state when initial DNS 
> lookup fails
> --
>
> Key: KAFKA-7974
> URL: https://issues.apache.org/jira/browse/KAFKA-7974
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Nicholas Parker
>Priority: Major
> Fix For: 2.3.0, 2.1.2, 2.2.1
>
>
> Version: kafka-clients-2.1.0
> I have some code that creates creates a KafkaAdminClient instance and then 
> invokes listTopics(). I was seeing the following stacktrace in the logs, 
> after which the KafkaAdminClient instance became unresponsive:
> {code:java}
> ERROR [kafka-admin-client-thread | adminclient-1] 2019-02-18 01:00:45,597 
> KafkaThread.java:51 - Uncaught exception in thread 'kafka-admin-client-thread 
> | adminclient-1':
> java.lang.IllegalStateException: No entry found for connection 0
>     at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:330)
>     at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:134)
>     at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:921)
>     at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
>     at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:898)
>     at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1113)
>     at java.lang.Thread.run(Thread.java:748){code}
> From looking at the code I was able to trace down a possible cause:
>  * NetworkClient.ready() invokes this.initiateConnect() as seen in the above 
> stacktrace
>  * NetworkClient.initiateConnect() invokes 
> ClusterConnectionStates.connecting(), which internally invokes 
> ClientUtils.resolve() to to resolve the host when creating an entry for the 
> connection.
>  * If this host lookup fails, a UnknownHostException can be thrown back to 
> NetworkClient.initiateConnect() and the connection entry is not created in 
> ClusterConnectionStates. This exception doesn't get logged so this is a guess 
> on my part.
>  * NetworkClient.initiateConnect() catches the exception and attempts to call 
> ClusterConnectionStates.disconnected(), which throws an IllegalStateException 
> because no entry had yet been created due to the lookup failure.
>  * This IllegalStateException ends up killing the worker thread and 
> KafkaAdminClient gets stuck, never returning from listTopics().



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-7912) In-memory key-value store does not support concurrent access

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-7912:

Component/s: streams

> In-memory key-value store does not support concurrent access 
> -
>
> Key: KAFKA-7912
> URL: https://issues.apache.org/jira/browse/KAFKA-7912
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Fix For: 2.3.0
>
>
> Currently, the in-memory key-value store uses a Map to store key-value pairs 
> and fetches them by calling subMap and returning an iterator to this submap. 
> This is unsafe as the submap is just a view of the original map and there is 
> risk of concurrent access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-7831) Consumer SubscriptionState missing synchronization

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-7831:

Component/s: consumer

> Consumer SubscriptionState missing synchronization
> --
>
> Key: KAFKA-7831
> URL: https://issues.apache.org/jira/browse/KAFKA-7831
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.3.0
>
>
> ConsumerCoordinator installs a Metadata.Listener in order to update pattern 
> subscriptions after metadata changes. The listener is invoked from 
> NetworkClient.poll, which could happen in the heartbeat thread. Currently, 
> however, there is no synchronization in SubscriptionState to make this safe. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-3143) inconsistent state in ZK when all replicas are dead

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3143:

Component/s: controller

> inconsistent state in ZK when all replicas are dead
> ---
>
> Key: KAFKA-3143
> URL: https://issues.apache.org/jira/browse/KAFKA-3143
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Jun Rao
>Assignee: Ismael Juma
>Priority: Major
>  Labels: reliability
> Fix For: 2.3.0
>
>
> This issue can be recreated in the following steps.
> 1. Start 3 brokers, 1, 2 and 3.
> 2. Create a topic with a single partition and 2 replicas, say on broker 1 and 
> 2.
> If we stop both replicas 1 and 2, depending on where the controller is, the 
> leader and isr stored in ZK in the end are different.
> If the controller is on broker 3, what's stored in ZK will be -1 for leader 
> and an empty set for ISR.
> On the other hand, if the controller is on broker 2 and we stop broker 1 
> followed by broker 2, what's stored in ZK will be 2 for leader and 2 for ISR.
> The issue is that in the first case, the controller will call 
> ReplicaStateMachine to transition to OfflineReplica, which will change the 
> leader and isr. However, in the second case, the controller fails over, but 
> we don't transition ReplicaStateMachine to OfflineReplica during controller 
> initialization.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8294) Batch StopReplica requests with partition deletion and add test cases

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-8294:

Component/s: controller

> Batch StopReplica requests with partition deletion and add test cases
> -
>
> Key: KAFKA-8294
> URL: https://issues.apache.org/jira/browse/KAFKA-8294
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.3.0
>
>
> One of the tricky aspects we found in KAFKA-8237 is the batching of the 
> StopReplica requests. We should have test cases covering expected behavior so 
> that we do not introduce regressions and we should make the batching 
> consistent whether or not `deletePartitions` is set.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-7440) Use leader epoch in consumer fetch requests

2019-07-11 Thread Andrew Olson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-7440:

Component/s: consumer

> Use leader epoch in consumer fetch requests
> ---
>
> Key: KAFKA-7440
> URL: https://issues.apache.org/jira/browse/KAFKA-7440
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: David Arthur
>Priority: Major
>  Labels: kip
> Fix For: 2.3.0
>
>
> This patch adds support in the consumer to use the leader epoch obtained from 
> the metadata in fetch requests: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8629) Kafka Streams Apps to support small native images through GraalVM

2019-07-11 Thread Staffan Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883067#comment-16883067
 ] 

Staffan Olsson edited comment on KAFKA-8629 at 7/11/19 3:08 PM:


Quarkus has a [Streams 
extension|https://quarkus.io/guides/kafka-streams-guide]. For details see 
[https://github.com/quarkusio/quarkus/pull/2693], 
[https://github.com/quarkusio/quarkus/issues/2863] and 
[https://github.com/quarkusio/quarkus/pull/2953.] Note also that IMO there's a 
blocker with Snappy and LZ4 disabled: 
[https://github.com/quarkusio/quarkus/issues/2718].

I've done some work on native builds of topic admin CLI tools, where the 
rewards in terms of reduced startup times really pay off: 
[https://github.com/solsson/dockerfiles/pull/25] ([example 
usage|https://github.com/Yolean/kubernetes-kafka/commit/af63569c41b78195b85c0ff7f919df93b2ef6003]).
 I've only used these builds in testing, but seen no regressions yet. They 
reduce CPU consumption for automated (by shell script) admin tasks by 90% and 
memory consumption by 80%. I did this work before I discovered 
[https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md] which 
would have made things a lot easier :)


was (Author: solsson):
Quarkus has a [Streams 
extension|[https://quarkus.io/guides/kafka-streams-guide]]. For details see 
[https://github.com/quarkusio/quarkus/pull/2693], 
[https://github.com/quarkusio/quarkus/issues/2863] and 
[https://github.com/quarkusio/quarkus/pull/2953.] Note also that IMO there's a 
blocker with Snappy and LZ4 disabled: 
[https://github.com/quarkusio/quarkus/issues/2718].

I've done some work on native builds of topic admin CLI tools, where the 
rewards in terms of reduced startup times really pay off: 
[https://github.com/solsson/dockerfiles/pull/25] ([example 
usage|https://github.com/Yolean/kubernetes-kafka/commit/af63569c41b78195b85c0ff7f919df93b2ef6003]).
 I've only used these builds in testing, but seen no regressions yet. They 
reduce CPU consumption for automated (by shell script) admin tasks by 90% and 
memory consumption by 80%. I did this work before I discovered 
[https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md] which 
would have made things a lot easier :)

> Kafka Streams Apps to support small native images through GraalVM
> -
>
> Key: KAFKA-8629
> URL: https://issues.apache.org/jira/browse/KAFKA-8629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
> Environment: OSX
> Linux on Docker
>Reporter: Andy Muir
>Priority: Minor
>
> I'm investigating using [GraalVM|http://example.com/] to help with reducing 
> docker image size and required resources for a simple Kafka Streams 
> microservice. To this end, I'm looking at running a microservice which:
> 1) consumes from a Kafka topic (XML)
> 2) Transforms into JSON
> 3) Produces to a new Kafka topic.
> The Kafka Streams app running in the JVM works fine.
> When I attempt to build it to a GraalVM native image (binary executable which 
> does not require the JVM, hence smaller image size and less resources), I 
> encountered a few 
> [incompatibilities|https://github.com/oracle/graal/blob/master/substratevm/LIMITATIONS.md]
>  with the source code in Kafka.
> I've implemented a workaround for each of these in a fork (link to follow) to 
> help establish if it is feasible. I don't intend (at this stage) for the 
> changes to be applied to the broker - I'm only after Kafka Streams for now. 
> I'm not sure whether it'd be a good idea for the broker itself to run as a 
> native image!
> There were 2 issues getting the native image with kafka streams:
> 1) Some Reflection use cases using MethodHandle
> 2) Anything JMX
> To work around these issues, I have:
> 1) Replaced use of MethodHandle with alternatives
> 2) Commented out the JMX code in a few places
> While the first may be sustainable, I'd expect that the 2nd option should be 
> put behind a configuration switch to allow the existing code to be used by 
> default and turning off JMX if configured.
> *I haven't created a PR for now, as I'd like feedback to decide if it is 
> going to be feasible to carry this forwards.*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8629) Kafka Streams Apps to support small native images through GraalVM

2019-07-11 Thread Staffan Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883067#comment-16883067
 ] 

Staffan Olsson edited comment on KAFKA-8629 at 7/11/19 3:08 PM:


Quarkus has a [Streams 
extension|[https://quarkus.io/guides/kafka-streams-guide]]. For details see 
[https://github.com/quarkusio/quarkus/pull/2693], 
[https://github.com/quarkusio/quarkus/issues/2863] and 
[https://github.com/quarkusio/quarkus/pull/2953.] Note also that IMO there's a 
blocker with Snappy and LZ4 disabled: 
[https://github.com/quarkusio/quarkus/issues/2718].

I've done some work on native builds of topic admin CLI tools, where the 
rewards in terms of reduced startup times really pay off: 
[https://github.com/solsson/dockerfiles/pull/25] ([example 
usage|https://github.com/Yolean/kubernetes-kafka/commit/af63569c41b78195b85c0ff7f919df93b2ef6003]).
 I've only used these builds in testing, but seen no regressions yet. They 
reduce CPU consumption for automated (by shell script) admin tasks by 90% and 
memory consumption by 80%. I did this work before I discovered 
[https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md] which 
would have made things a lot easier :)


was (Author: solsson):
Quarkus has a [Streams 
extension|[https://quarkus.io/guides/kafka-streams-guide]|https://quarkus.io/guides/kafka-streams-guide].
 For details see [https://github.com/quarkusio/quarkus/pull/2693], 
[https://github.com/quarkusio/quarkus/issues/2863] and 
[https://github.com/quarkusio/quarkus/pull/2953.] Note also that IMO there's a 
blocker with Snappy and LZ4 disabled: 
[https://github.com/quarkusio/quarkus/issues/2718].

I've done some work on native builds of topic admin CLI tools, where the 
rewards in terms of reduced startup times really pay off: 
[https://github.com/solsson/dockerfiles/pull/25] ([example 
usage|https://github.com/Yolean/kubernetes-kafka/commit/af63569c41b78195b85c0ff7f919df93b2ef6003]).
 I've only used these builds in testing, but seen no regressions yet. They 
reduce CPU consumption for automated (by shell script) admin tasks by 90% and 
memory consumption by 80%. I did this work before I discovered 
[https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md] which 
would have made things a lot easier :)

> Kafka Streams Apps to support small native images through GraalVM
> -
>
> Key: KAFKA-8629
> URL: https://issues.apache.org/jira/browse/KAFKA-8629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
> Environment: OSX
> Linux on Docker
>Reporter: Andy Muir
>Priority: Minor
>
> I'm investigating using [GraalVM|http://example.com/] to help with reducing 
> docker image size and required resources for a simple Kafka Streams 
> microservice. To this end, I'm looking at running a microservice which:
> 1) consumes from a Kafka topic (XML)
> 2) Transforms into JSON
> 3) Produces to a new Kafka topic.
> The Kafka Streams app running in the JVM works fine.
> When I attempt to build it to a GraalVM native image (binary executable which 
> does not require the JVM, hence smaller image size and less resources), I 
> encountered a few 
> [incompatibilities|https://github.com/oracle/graal/blob/master/substratevm/LIMITATIONS.md]
>  with the source code in Kafka.
> I've implemented a workaround for each of these in a fork (link to follow) to 
> help establish if it is feasible. I don't intend (at this stage) for the 
> changes to be applied to the broker - I'm only after Kafka Streams for now. 
> I'm not sure whether it'd be a good idea for the broker itself to run as a 
> native image!
> There were 2 issues getting the native image with kafka streams:
> 1) Some Reflection use cases using MethodHandle
> 2) Anything JMX
> To work around these issues, I have:
> 1) Replaced use of MethodHandle with alternatives
> 2) Commented out the JMX code in a few places
> While the first may be sustainable, I'd expect that the 2nd option should be 
> put behind a configuration switch to allow the existing code to be used by 
> default and turning off JMX if configured.
> *I haven't created a PR for now, as I'd like feedback to decide if it is 
> going to be feasible to carry this forwards.*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8629) Kafka Streams Apps to support small native images through GraalVM

2019-07-11 Thread Staffan Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883067#comment-16883067
 ] 

Staffan Olsson edited comment on KAFKA-8629 at 7/11/19 3:08 PM:


Quarkus has a [Streams 
extension|[https://quarkus.io/guides/kafka-streams-guide]|https://quarkus.io/guides/kafka-streams-guide].
 For details see [https://github.com/quarkusio/quarkus/pull/2693], 
[https://github.com/quarkusio/quarkus/issues/2863] and 
[https://github.com/quarkusio/quarkus/pull/2953.] Note also that IMO there's a 
blocker with Snappy and LZ4 disabled: 
[https://github.com/quarkusio/quarkus/issues/2718].

I've done some work on native builds of topic admin CLI tools, where the 
rewards in terms of reduced startup times really pay off: 
[https://github.com/solsson/dockerfiles/pull/25] ([example 
usage|https://github.com/Yolean/kubernetes-kafka/commit/af63569c41b78195b85c0ff7f919df93b2ef6003]).
 I've only used these builds in testing, but seen no regressions yet. They 
reduce CPU consumption for automated (by shell script) admin tasks by 90% and 
memory consumption by 80%. I did this work before I discovered 
[https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md] which 
would have made things a lot easier :)


was (Author: solsson):
Quarkus has a [Streams 
extension|[https://quarkus.io/guides/kafka-streams-guide]|https://quarkus.io/guides/kafka-streams-guide],].
 For details see [https://github.com/quarkusio/quarkus/pull/2693], 
[https://github.com/quarkusio/quarkus/issues/2863] and 
[https://github.com/quarkusio/quarkus/pull/2953.] Note also that IMO there's a 
blocker with Snappy and LZ4 disabled: 
[https://github.com/quarkusio/quarkus/issues/2718].

I've done some work on native builds of topic admin CLI tools, where the 
rewards in terms of reduced startup times really pay off: 
[https://github.com/solsson/dockerfiles/pull/25] ([example 
usage|https://github.com/Yolean/kubernetes-kafka/commit/af63569c41b78195b85c0ff7f919df93b2ef6003]).
 I've only used these builds in testing, but seen no regressions yet. They 
reduce CPU consumption for automated (by shell script) admin tasks by 90% and 
memory consumption by 80%. I did this work before I discovered 
[https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md] which 
would have made things a lot easier :)

> Kafka Streams Apps to support small native images through GraalVM
> -
>
> Key: KAFKA-8629
> URL: https://issues.apache.org/jira/browse/KAFKA-8629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
> Environment: OSX
> Linux on Docker
>Reporter: Andy Muir
>Priority: Minor
>
> I'm investigating using [GraalVM|http://example.com/] to help with reducing 
> docker image size and required resources for a simple Kafka Streams 
> microservice. To this end, I'm looking at running a microservice which:
> 1) consumes from a Kafka topic (XML)
> 2) Transforms into JSON
> 3) Produces to a new Kafka topic.
> The Kafka Streams app running in the JVM works fine.
> When I attempt to build it to a GraalVM native image (binary executable which 
> does not require the JVM, hence smaller image size and less resources), I 
> encountered a few 
> [incompatibilities|https://github.com/oracle/graal/blob/master/substratevm/LIMITATIONS.md]
>  with the source code in Kafka.
> I've implemented a workaround for each of these in a fork (link to follow) to 
> help establish if it is feasible. I don't intend (at this stage) for the 
> changes to be applied to the broker - I'm only after Kafka Streams for now. 
> I'm not sure whether it'd be a good idea for the broker itself to run as a 
> native image!
> There were 2 issues getting the native image with kafka streams:
> 1) Some Reflection use cases using MethodHandle
> 2) Anything JMX
> To work around these issues, I have:
> 1) Replaced use of MethodHandle with alternatives
> 2) Commented out the JMX code in a few places
> While the first may be sustainable, I'd expect that the 2nd option should be 
> put behind a configuration switch to allow the existing code to be used by 
> default and turning off JMX if configured.
> *I haven't created a PR for now, as I'd like feedback to decide if it is 
> going to be feasible to carry this forwards.*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8629) Kafka Streams Apps to support small native images through GraalVM

2019-07-11 Thread Staffan Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883067#comment-16883067
 ] 

Staffan Olsson commented on KAFKA-8629:
---

Quarkus has a [Streams 
extension|[https://quarkus.io/guides/kafka-streams-guide]|https://quarkus.io/guides/kafka-streams-guide],].
 For details see [https://github.com/quarkusio/quarkus/pull/2693], 
[https://github.com/quarkusio/quarkus/issues/2863] and 
[https://github.com/quarkusio/quarkus/pull/2953.] Note also that IMO there's a 
blocker with Snappy and LZ4 disabled: 
[https://github.com/quarkusio/quarkus/issues/2718].

I've done some work on native builds of topic admin CLI tools, where the 
rewards in terms of reduced startup times really pay off: 
[https://github.com/solsson/dockerfiles/pull/25] ([example 
usage|https://github.com/Yolean/kubernetes-kafka/commit/af63569c41b78195b85c0ff7f919df93b2ef6003]).
 I've only used these builds in testing, but seen no regressions yet. They 
reduce CPU consumption for automated (by shell script) admin tasks by 90% and 
memory consumption by 80%. I did this work before I discovered 
[https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md] which 
would have made things a lot easier :)

> Kafka Streams Apps to support small native images through GraalVM
> -
>
> Key: KAFKA-8629
> URL: https://issues.apache.org/jira/browse/KAFKA-8629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
> Environment: OSX
> Linux on Docker
>Reporter: Andy Muir
>Priority: Minor
>
> I'm investigating using [GraalVM|http://example.com/] to help with reducing 
> docker image size and required resources for a simple Kafka Streams 
> microservice. To this end, I'm looking at running a microservice which:
> 1) consumes from a Kafka topic (XML)
> 2) Transforms into JSON
> 3) Produces to a new Kafka topic.
> The Kafka Streams app running in the JVM works fine.
> When I attempt to build it to a GraalVM native image (binary executable which 
> does not require the JVM, hence smaller image size and less resources), I 
> encountered a few 
> [incompatibilities|https://github.com/oracle/graal/blob/master/substratevm/LIMITATIONS.md]
>  with the source code in Kafka.
> I've implemented a workaround for each of these in a fork (link to follow) to 
> help establish if it is feasible. I don't intend (at this stage) for the 
> changes to be applied to the broker - I'm only after Kafka Streams for now. 
> I'm not sure whether it'd be a good idea for the broker itself to run as a 
> native image!
> There were 2 issues getting the native image with kafka streams:
> 1) Some Reflection use cases using MethodHandle
> 2) Anything JMX
> To work around these issues, I have:
> 1) Replaced use of MethodHandle with alternatives
> 2) Commented out the JMX code in a few places
> While the first may be sustainable, I'd expect that the 2nd option should be 
> put behind a configuration switch to allow the existing code to be used by 
> default and turning off JMX if configured.
> *I haven't created a PR for now, as I'd like feedback to decide if it is 
> going to be feasible to carry this forwards.*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8602) StreamThread Dies Because Restore Consumer is not Subscribed to Any Topic

2019-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882839#comment-16882839
 ] 

ASF GitHub Bot commented on KAFKA-8602:
---

cadonna commented on pull request #7008: KAFKA-8602: Fix bug in stand-by task 
creation
URL: https://github.com/apache/kafka/pull/7008
 
 
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> StreamThread Dies Because Restore Consumer is not Subscribed to Any Topic
> -
>
> Key: KAFKA-8602
> URL: https://issues.apache.org/jira/browse/KAFKA-8602
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Critical
>
> StreamThread dies with the following exception:
> {code:java}
> java.lang.IllegalStateException: Consumer is not subscribed to any topics or 
> assigned any partitions
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1206)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1199)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeUpdateStandbyTasks(StreamThread.java:1126)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:923)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
> {code}
> The reason is that the restore consumer is not subscribed to any topic. This 
> happens when a StreamThread gets assigned standby tasks for sub-topologies 
> with just state stores with disabled logging.
> To reproduce the bug start two applications with one StreamThread and one 
> standby replica each and the following topology. The input topic should have 
> two partitions:
> {code:java}
> final StreamsBuilder builder = new StreamsBuilder();
> final String stateStoreName = "myTransformState";
> final StoreBuilder> keyValueStoreBuilder =
> 
> Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore(stateStoreName),
> Serdes.Integer(),
> Serdes.Integer())
> .withLoggingDisabled();
> builder.addStateStore(keyValueStoreBuilder);
> builder.stream(INPUT_TOPIC, Consumed.with(Serdes.Integer(), Serdes.Integer()))
> .transform(() -> new Transformer Integer>>() {
> private KeyValueStore state;
> @SuppressWarnings("unchecked")
> @Override
> public void init(final ProcessorContext context) {
> state = (KeyValueStore) 
> context.getStateStore(stateStoreName);
> }
> @Override
> public KeyValue transform(final Integer key, 
> final Integer value) {
> final KeyValue result = new KeyValue<>(key, 
> value);
> return result;
> }
> @Override
> public void close() {}
> }, stateStoreName)
> .to(OUTPUT_TOPIC);
> {code}
> Both StreamThreads should die with the above exception.
> The root cause is that standby tasks are created although all state stores of 
> the sub-topology have logging disabled.  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8602) StreamThread Dies Because Restore Consumer is not Subscribed to Any Topic

2019-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882837#comment-16882837
 ] 

ASF GitHub Bot commented on KAFKA-8602:
---

cadonna commented on pull request #7008: KAFKA-8602: Fix bug in stand-by task 
creation
URL: https://github.com/apache/kafka/pull/7008
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> StreamThread Dies Because Restore Consumer is not Subscribed to Any Topic
> -
>
> Key: KAFKA-8602
> URL: https://issues.apache.org/jira/browse/KAFKA-8602
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Critical
>
> StreamThread dies with the following exception:
> {code:java}
> java.lang.IllegalStateException: Consumer is not subscribed to any topics or 
> assigned any partitions
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1206)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1199)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeUpdateStandbyTasks(StreamThread.java:1126)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:923)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
> {code}
> The reason is that the restore consumer is not subscribed to any topic. This 
> happens when a StreamThread gets assigned standby tasks for sub-topologies 
> with just state stores with disabled logging.
> To reproduce the bug start two applications with one StreamThread and one 
> standby replica each and the following topology. The input topic should have 
> two partitions:
> {code:java}
> final StreamsBuilder builder = new StreamsBuilder();
> final String stateStoreName = "myTransformState";
> final StoreBuilder> keyValueStoreBuilder =
> 
> Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore(stateStoreName),
> Serdes.Integer(),
> Serdes.Integer())
> .withLoggingDisabled();
> builder.addStateStore(keyValueStoreBuilder);
> builder.stream(INPUT_TOPIC, Consumed.with(Serdes.Integer(), Serdes.Integer()))
> .transform(() -> new Transformer Integer>>() {
> private KeyValueStore state;
> @SuppressWarnings("unchecked")
> @Override
> public void init(final ProcessorContext context) {
> state = (KeyValueStore) 
> context.getStateStore(stateStoreName);
> }
> @Override
> public KeyValue transform(final Integer key, 
> final Integer value) {
> final KeyValue result = new KeyValue<>(key, 
> value);
> return result;
> }
> @Override
> public void close() {}
> }, stateStoreName)
> .to(OUTPUT_TOPIC);
> {code}
> Both StreamThreads should die with the above exception.
> The root cause is that standby tasks are created although all state stores of 
> the sub-topology have logging disabled.  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8655) Failed to start systemctl kafka (code=exited, status=1/FAILURE)

2019-07-11 Thread Ben (JIRA)
Ben created KAFKA-8655:
--

 Summary: Failed to start systemctl kafka (code=exited, 
status=1/FAILURE)
 Key: KAFKA-8655
 URL: https://issues.apache.org/jira/browse/KAFKA-8655
 Project: Kafka
  Issue Type: Bug
Reporter: Ben


Alright so what i  did was
 sudo systemctl enable confluent-zookeeper
sudo systemctl enable confluent-kafka
sudo systemctl start confluent-zookeeper
i got a acces to file error, i've chmod it and now zookeeper works fine.
sudo systemctl start confluent-kafka
i got a error still couldn't fix , this is the output 

    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.j
    at java.nio.channels.FileChannel.open(FileChannel.java:287)
    at java.nio.channels.FileChannel.open(FileChannel.java:335)
    at org.apache.kafka.common.record.FileRecords.openChannel(FileRecords.java:4
    at org.apache.kafka.common.record.FileRecords.open(FileRecords.java:410)
    at org.apache.kafka.common.record.FileRecords.open(FileRecords.java:419)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8654) Cant restart heartbeatThread if encountered unexpected exception in heartbeatloop.

2019-07-11 Thread nick allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nick allen updated KAFKA-8654:
--
Description: 
There is a consumer in our cluster which has relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days, and *at 
07-02 13:56 we did restarted brokers*. And jstack shows the corresponding 
heartbeatThread is dead. Unfortunately we dont keep logs for that long so I 
cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should be at least 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why is there 
so many catch clause?

 

  was:
There is a consumer in our cluster which has relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days, and at 07-02 
13:56 we did restarted brokers . And jstack shows the corresponding 
heartbeatThread is dead. Unfortunately we dont keep logs for that long so I 
cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should be at least 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why is there 
so many catch clause?

 


> Cant restart heartbeatThread if encountered unexpected exception in 
> heartbeatloop.
> --
>
> Key: KAFKA-8654
> URL: https://issues.apache.org/jira/browse/KAFKA-8654
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.0
>Reporter: nick allen
>Priority: Major
>
> There is a consumer in our cluster which has relatively high cpu usage for 
> several days caused by kafka poll thread. So we dig in to find out that was 
> because 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
>  returned zero leading to non-blocking select which in turn leading to 
> pollForFetches returned immediately. But the actual poll timeout is set to 
> 1s, so pollForFetches was called thousands of time per poll/second.
> We use tool to inspect memory variables which show the corresponding 
> heartbeatTimer's attribute:  
> @Timer[
>  time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
>  startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
>  currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
>  deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
>  ]
> That shows that heartbeat hasn't been happening for about 10 days, and *at 
> 07-02 13:56 we did restarted brokers*. And jstack shows the corresponding 
> heartbeatThread is dead. Unfortunately we dont keep logs for that long so I 
> cant figure out what happened then. 
> IMO heartbeatThread is too important to be left dead, there should be at 
> least some way to revive it, but it 

[jira] [Updated] (KAFKA-8654) Cant restart heartbeatThread if encountered unexpected exception in heartbeatloop.

2019-07-11 Thread nick allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nick allen updated KAFKA-8654:
--
Description: 
There is a consumer in our cluster which has relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days, and at 07-02 
13:56 we did restarted brokers . And jstack shows the corresponding 
heartbeatThread is dead. Unfortunately we dont keep logs for that long so I 
cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should be at least 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why is there 
so many catch clause?

 

  was:
There is a consumer in our cluster which has relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days. And jstack 
shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
logs for that long so I cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should be at least 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why is there 
so many catch clause?

 


> Cant restart heartbeatThread if encountered unexpected exception in 
> heartbeatloop.
> --
>
> Key: KAFKA-8654
> URL: https://issues.apache.org/jira/browse/KAFKA-8654
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.0
>Reporter: nick allen
>Priority: Major
>
> There is a consumer in our cluster which has relatively high cpu usage for 
> several days caused by kafka poll thread. So we dig in to find out that was 
> because 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
>  returned zero leading to non-blocking select which in turn leading to 
> pollForFetches returned immediately. But the actual poll timeout is set to 
> 1s, so pollForFetches was called thousands of time per poll/second.
> We use tool to inspect memory variables which show the corresponding 
> heartbeatTimer's attribute:  
> @Timer[
>  time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
>  startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
>  currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
>  deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
>  ]
> That shows that heartbeat hasn't been happening for about 10 days, and at 
> 07-02 13:56 we did restarted brokers . And jstack shows the corresponding 
> heartbeatThread is dead. Unfortunately we dont keep logs for that long so I 
> cant figure out what happened then. 
> IMO heartbeatThread is too important to be left dead, there should be at 
> least some way to revive it, but it seems that startHeartbeatThreadIfNeeded 
> can on

[jira] [Updated] (KAFKA-8654) Cant restart heartbeatThread if encountered unexpected exception in heartbeatloop.

2019-07-11 Thread nick allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nick allen updated KAFKA-8654:
--
Description: 
There is a consumer in our cluster which has relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days. And jstack 
shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
logs for that long so I cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should be at least 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why is there 
so many catch clause?

 

  was:
There is a consumer in our cluster which has relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days. And jstack 
shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
logs for that long so I cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should be at least 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why there is 
so many catch clause?

 


> Cant restart heartbeatThread if encountered unexpected exception in 
> heartbeatloop.
> --
>
> Key: KAFKA-8654
> URL: https://issues.apache.org/jira/browse/KAFKA-8654
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.0
>Reporter: nick allen
>Priority: Major
>
> There is a consumer in our cluster which has relatively high cpu usage for 
> several days caused by kafka poll thread. So we dig in to find out that was 
> because 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
>  returned zero leading to non-blocking select which in turn leading to 
> pollForFetches returned immediately. But the actual poll timeout is set to 
> 1s, so pollForFetches was called thousands of time per poll/second.
> We use tool to inspect memory variables which show the corresponding 
> heartbeatTimer's attribute:  
> @Timer[
>  time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
>  startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
>  currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
>  deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
>  ]
> That shows that heartbeat hasn't been happening for about 10 days. And jstack 
> shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
> logs for that long so I cant figure out what happened then. 
> IMO heartbeatThread is too important to be left dead, there should be at 
> least some way to revive it, but it seems that startHeartbeatThreadIfNeeded 
> can only be triggered by restarting or heartBeat itself.
> It's also confusing that almost everything 

[jira] [Updated] (KAFKA-8654) Cant restart heartbeatThread if encountered unexpected exception in heartbeatloop.

2019-07-11 Thread nick allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nick allen updated KAFKA-8654:
--
Description: 
There is a consumer in our cluster which has relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days. And jstack 
shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
logs for that long so I cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should be at least 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why there is 
so many catch clause?

 

  was:
There is a consumer in our cluster which have relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days. And jstack 
shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
logs for that long so I cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should be at least 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why there is 
so many catch clause?

 


> Cant restart heartbeatThread if encountered unexpected exception in 
> heartbeatloop.
> --
>
> Key: KAFKA-8654
> URL: https://issues.apache.org/jira/browse/KAFKA-8654
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.0
>Reporter: nick allen
>Priority: Major
>
> There is a consumer in our cluster which has relatively high cpu usage for 
> several days caused by kafka poll thread. So we dig in to find out that was 
> because 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
>  returned zero leading to non-blocking select which in turn leading to 
> pollForFetches returned immediately. But the actual poll timeout is set to 
> 1s, so pollForFetches was called thousands of time per poll/second.
> We use tool to inspect memory variables which show the corresponding 
> heartbeatTimer's attribute:  
> @Timer[
>  time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
>  startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
>  currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
>  deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
>  ]
> That shows that heartbeat hasn't been happening for about 10 days. And jstack 
> shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
> logs for that long so I cant figure out what happened then. 
> IMO heartbeatThread is too important to be left dead, there should be at 
> least some way to revive it, but it seems that startHeartbeatThreadIfNeeded 
> can only be triggered by restarting or heartBeat itself.
> It's also confusing that almost everything

[jira] [Updated] (KAFKA-8654) Cant restart heartbeatThread if encountered unexpected exception in heartbeatloop.

2019-07-11 Thread nick allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nick allen updated KAFKA-8654:
--
Summary: Cant restart heartbeatThread if encountered unexpected exception 
in heartbeatloop.  (was: Cant restart heartbeatThread if encountered unexpected 
exception in heartbeatloop。)

> Cant restart heartbeatThread if encountered unexpected exception in 
> heartbeatloop.
> --
>
> Key: KAFKA-8654
> URL: https://issues.apache.org/jira/browse/KAFKA-8654
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.0
>Reporter: nick allen
>Priority: Major
>
> There is a consumer in our cluster which have relatively high cpu usage for 
> several days caused by kafka poll thread. So we dig in to find out that was 
> because 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
>  returned zero leading to non-blocking select which in turn leading to 
> pollForFetches returned immediately. But the actual poll timeout is set to 
> 1s, so pollForFetches was called thousands of time per poll/second.
> We use tool to inspect memory variables which show the corresponding 
> heartbeatTimer's attribute:  
> @Timer[
>  time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
>  startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
>  currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
>  deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
>  ]
> That shows that heartbeat hasn't been happening for about 10 days. And jstack 
> shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
> logs for that long so I cant figure out what happened then. 
> IMO heartbeatThread is too important to be left dead, there should be at 
> least some way to revive it, but it seems that startHeartbeatThreadIfNeeded 
> can only be triggered by restarting or heartBeat itself.
> It's also confusing that almost everything in 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
>  is async so it seems impossible for any exception to happen, so why there is 
> so many catch clause?
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8654) Cant restart heartbeatThread if encountered unexpected exception in heartbeatloop。

2019-07-11 Thread nick allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nick allen updated KAFKA-8654:
--
Description: 
There is a consumer in our cluster which have relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days. And jstack 
shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
logs for that long so I cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should be at least 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why there is 
so many catch clause?

 

  was:
There is a consumer in our cluster which have relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days. And jstack 
shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
logs for that long so I cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should at least be 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why there is 
so many catch clause?

 


> Cant restart heartbeatThread if encountered unexpected exception in 
> heartbeatloop。
> --
>
> Key: KAFKA-8654
> URL: https://issues.apache.org/jira/browse/KAFKA-8654
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.0
>Reporter: nick allen
>Priority: Major
>
> There is a consumer in our cluster which have relatively high cpu usage for 
> several days caused by kafka poll thread. So we dig in to find out that was 
> because 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
>  returned zero leading to non-blocking select which in turn leading to 
> pollForFetches returned immediately. But the actual poll timeout is set to 
> 1s, so pollForFetches was called thousands of time per poll/second.
> We use tool to inspect memory variables which show the corresponding 
> heartbeatTimer's attribute:  
> @Timer[
>  time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
>  startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
>  currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
>  deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
>  ]
> That shows that heartbeat hasn't been happening for about 10 days. And jstack 
> shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
> logs for that long so I cant figure out what happened then. 
> IMO heartbeatThread is too important to be left dead, there should be at 
> least some way to revive it, but it seems that startHeartbeatThreadIfNeeded 
> can only be triggered by restarting or heartBeat itself.
> It's also confusing that almost everythi

[jira] [Created] (KAFKA-8654) Cant restart heartbeatThread if encountered unexpected exception in heartbeatloop。

2019-07-11 Thread nick allen (JIRA)
nick allen created KAFKA-8654:
-

 Summary: Cant restart heartbeatThread if encountered unexpected 
exception in heartbeatloop。
 Key: KAFKA-8654
 URL: https://issues.apache.org/jira/browse/KAFKA-8654
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 2.1.0
Reporter: nick allen


There is a consumer in our cluster which have relatively high cpu usage for 
several days caused by kafka poll thread. So we dig in to find out that was 
because 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator#timeToNextHeartbeat
 returned zero leading to non-blocking select which in turn leading to 
pollForFetches returned immediately. But the actual poll timeout is set to 1s, 
so pollForFetches was called thousands of time per poll/second.

We use tool to inspect memory variables which show the corresponding 
heartbeatTimer's attribute:  

@Timer[
 time=@SystemTime[org.apache.kafka.common.utils.SystemTime@4d806627],
 startMs=@Long[1562075783801], // Jul 02 2019 13:56:23
 currentTimeMs=@Long[1562823681506], // Thu Jul 11 2019 05:41:21
 deadlineMs=@Long[1562075793801], // Tue Jul 02 2019 13:56:33
 ]

That shows that heartbeat hasn't been happening for about 10 days. And jstack 
shows the corresponding heartbeatThread is dead. Unfortunately we dont keep 
logs for that long so I cant figure out what happened then. 

IMO heartbeatThread is too important to be left dead, there should at least be 
some way to revive it, but it seems that startHeartbeatThreadIfNeeded can only 
be triggered by restarting or heartBeat itself.

It's also confusing that almost everything in 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread#run
 is async so it seems impossible for any exception to happen, so why there is 
so many catch clause?

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)