[jira] [Updated] (KAFKA-3802) log mtimes reset on broker restart

2016-06-15 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-3802:
---
Status: Patch Available  (was: Open)

> log mtimes reset on broker restart
> --
>
> Key: KAFKA-3802
> URL: https://issues.apache.org/jira/browse/KAFKA-3802
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Otto
> Fix For: 0.10.0.1
>
>
> Folks over in 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAO8=cz0ragjad1acx4geqcwj+rkd1gmdavkjwytwthkszfg...@mail.gmail.com%3E
>  are commenting about this issue.
> In 0.9, any data log file that was on
> disk before the broker has it's mtime modified to the time of the broker
> restart.
> This causes problems with log retention, as all the files then look like
> they contain recent data to kafka.  We use the default log retention of 7
> days, but if all the files are touched at the same time, this can cause us
> to retain up to 2 weeks of log data, which can fill up our disks.
> This happens *most* of the time, but seemingly not all.  We have seen broker 
> restarts where mtimes were not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk8 #699

2016-06-15 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-06-15 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332996#comment-15332996
 ] 

Jason Gustafson commented on KAFKA-3144:


[~junrao], [~vahid] Another helpful feature would be to be able to list the 
last committed offsets for groups with no active members. This may not have 
been possible prior to KAFKA-2720 because ListGroups only returned non-empty 
groups, but now it should return all of them. In that case maybe we could 
display "N/A" (or something like that) as the owner of the partition. We could 
do that here or we could open a new JIRA. What do you think?

> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3850) WorkerSinkTask should retry commits if woken up during rebalance or shutdown

2016-06-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332964#comment-15332964
 ] 

ASF GitHub Bot commented on KAFKA-3850:
---

GitHub user hachikuji reopened a pull request:

https://github.com/apache/kafka/pull/1511

KAFKA-3850: WorkerSinkTask commit prior to rebalance should be retried on 
wakeup



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka 
retry-commit-on-wakeup-in-sinks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1511.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1511


commit 0e2e4568325a52611f67e172688e2ade86e8bdf5
Author: Jason Gustafson 
Date:   2016-06-16T00:40:32Z

KAFKA-??: WorkerSinkTask commit prior to rebalance should be retried on 
wakeup




> WorkerSinkTask should retry commits if woken up during rebalance or shutdown
> 
>
> Key: KAFKA-3850
> URL: https://issues.apache.org/jira/browse/KAFKA-3850
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 0.10.0.1
>
>
> We use consumer.wakeup() to interrupt long polls when we need to pause/resume 
> partitions and when we shutdown sink tasks. The resulting {{WakeupException}} 
> could be raised from the synchronous commit which we use in between 
> rebalances and on shutdown. Since we don't currently catch this exception, we 
> can fail to commit offsets, which typically results in duplicates. To fix 
> this problem, we should catch the exception, retry the commit, and then 
> rethrow it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3850) WorkerSinkTask should retry commits if woken up during rebalance or shutdown

2016-06-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332963#comment-15332963
 ] 

ASF GitHub Bot commented on KAFKA-3850:
---

Github user hachikuji closed the pull request at:

https://github.com/apache/kafka/pull/1511


> WorkerSinkTask should retry commits if woken up during rebalance or shutdown
> 
>
> Key: KAFKA-3850
> URL: https://issues.apache.org/jira/browse/KAFKA-3850
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 0.10.0.1
>
>
> We use consumer.wakeup() to interrupt long polls when we need to pause/resume 
> partitions and when we shutdown sink tasks. The resulting {{WakeupException}} 
> could be raised from the synchronous commit which we use in between 
> rebalances and on shutdown. Since we don't currently catch this exception, we 
> can fail to commit offsets, which typically results in duplicates. To fix 
> this problem, we should catch the exception, retry the commit, and then 
> rethrow it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3850) WorkerSinkTask should retry commits if woken up during rebalance or shutdown

2016-06-15 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3850:
---
Status: Patch Available  (was: Open)

> WorkerSinkTask should retry commits if woken up during rebalance or shutdown
> 
>
> Key: KAFKA-3850
> URL: https://issues.apache.org/jira/browse/KAFKA-3850
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 0.10.0.1
>
>
> We use consumer.wakeup() to interrupt long polls when we need to pause/resume 
> partitions and when we shutdown sink tasks. The resulting {{WakeupException}} 
> could be raised from the synchronous commit which we use in between 
> rebalances and on shutdown. Since we don't currently catch this exception, we 
> can fail to commit offsets, which typically results in duplicates. To fix 
> this problem, we should catch the exception, retry the commit, and then 
> rethrow it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1511: KAFKA-3850: WorkerSinkTask commit prior to rebalan...

2016-06-15 Thread hachikuji
Github user hachikuji closed the pull request at:

https://github.com/apache/kafka/pull/1511


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-3850) WorkerSinkTask should retry commits if woken up during rebalance or shutdown

2016-06-15 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-3850:
--

 Summary: WorkerSinkTask should retry commits if woken up during 
rebalance or shutdown
 Key: KAFKA-3850
 URL: https://issues.apache.org/jira/browse/KAFKA-3850
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.10.0.0
Reporter: Jason Gustafson
Assignee: Jason Gustafson
 Fix For: 0.10.0.1


We use consumer.wakeup() to interrupt long polls when we need to pause/resume 
partitions and when we shutdown sink tasks. The resulting {{WakeupException}} 
could be raised from the synchronous commit which we use in between rebalances 
and on shutdown. Since we don't currently catch this exception, we can fail to 
commit offsets, which typically results in duplicates. To fix this problem, we 
should catch the exception, retry the commit, and then rethrow it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1511: KAFKA-3850: WorkerSinkTask commit prior to rebalan...

2016-06-15 Thread hachikuji
GitHub user hachikuji reopened a pull request:

https://github.com/apache/kafka/pull/1511

KAFKA-3850: WorkerSinkTask commit prior to rebalance should be retried on 
wakeup



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka 
retry-commit-on-wakeup-in-sinks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1511.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1511


commit 0e2e4568325a52611f67e172688e2ade86e8bdf5
Author: Jason Gustafson 
Date:   2016-06-16T00:40:32Z

KAFKA-??: WorkerSinkTask commit prior to rebalance should be retried on 
wakeup




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-2720) Periodic purging groups in the coordinator

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-2720:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1427
[https://github.com/apache/kafka/pull/1427]

> Periodic purging groups in the coordinator
> --
>
> Key: KAFKA-2720
> URL: https://issues.apache.org/jira/browse/KAFKA-2720
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0
>
>
> Currently the coordinator removes the group (i.e. both removing it from the 
> cache and writing the tombstone message on its local replica without waiting 
> for ack) once it becomes an empty group.
> This can lead to a few issues such as 1) group removal and creation churns 
> when a group with very few members are being rebalanced, 2) if the local 
> write is failed / not propagated to other followers, they can only be removed 
> again when a new coordinator is migrated and detects the group has no members 
> already.
> We could instead piggy-back the periodic offset expiration along with the 
> group purging as well which removes any groups that had no corresponding 
> offsets in the cache any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2720) Periodic purging groups in the coordinator

2016-06-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332949#comment-15332949
 ] 

ASF GitHub Bot commented on KAFKA-2720:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1427


> Periodic purging groups in the coordinator
> --
>
> Key: KAFKA-2720
> URL: https://issues.apache.org/jira/browse/KAFKA-2720
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0
>
>
> Currently the coordinator removes the group (i.e. both removing it from the 
> cache and writing the tombstone message on its local replica without waiting 
> for ack) once it becomes an empty group.
> This can lead to a few issues such as 1) group removal and creation churns 
> when a group with very few members are being rebalanced, 2) if the local 
> write is failed / not propagated to other followers, they can only be removed 
> again when a new coordinator is migrated and detects the group has no members 
> already.
> We could instead piggy-back the periodic offset expiration along with the 
> group purging as well which removes any groups that had no corresponding 
> offsets in the cache any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1427: KAFKA-2720: expire group metadata when all offsets...

2016-06-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1427


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3443) Support regex topics in addSource() and stream()

2016-06-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332913#comment-15332913
 ] 

ASF GitHub Bot commented on KAFKA-3443:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1477


> Support regex topics in addSource() and stream()
> 
>
> Key: KAFKA-3443
> URL: https://issues.apache.org/jira/browse/KAFKA-3443
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently Kafka Streams only support specific topics in creating source 
> streams, while we can leverage consumer's regex subscription to allow regex 
> topics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1477: KAFKA-3443 [Kafka Stream] support for adding sourc...

2016-06-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1477


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3443) Support regex topics in addSource() and stream()

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3443:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1477
[https://github.com/apache/kafka/pull/1477]

> Support regex topics in addSource() and stream()
> 
>
> Key: KAFKA-3443
> URL: https://issues.apache.org/jira/browse/KAFKA-3443
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently Kafka Streams only support specific topics in creating source 
> streams, while we can leverage consumer's regex subscription to allow regex 
> topics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #698

2016-06-15 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3753: Add approximateNumEntries() method to KeyValueStore

--
[...truncated 12519 lines...]
org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByAlias STARTED

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByAlias PASSED

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByShortAlias 
STARTED

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByShortAlias 
PASSED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidConnector STARTED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidConnector PASSED

org.apache.kafka.connect.runtime.WorkerTest > testReconfigureConnectorTasks 
STARTED

org.apache.kafka.connect.runtime.WorkerTest > testReconfigureConnectorTasks 
PASSED

org.apache.kafka.connect.runtime.WorkerTest > testAddRemoveTask STARTED

org.apache.kafka.connect.runtime.WorkerTest > testAddRemoveTask PASSED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidTask STARTED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidTask PASSED

org.apache.kafka.connect.runtime.WorkerTest > testCleanupTasksOnStop STARTED

org.apache.kafka.connect.runtime.WorkerTest > testCleanupTasksOnStop PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testPollsInBackground STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testPollsInBackground PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommit STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommit PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitTaskFlushFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitTaskFlushFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitTaskSuccessAndFlushFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitTaskSuccessAndFlushFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitConsumerFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitConsumerFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitTimeout 
STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitTimeout 
PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testAssignmentPauseResume STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testAssignmentPauseResume PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testRewind STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testRewind PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupFollower STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupFollower PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testMetadata STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testMetadata PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment1 STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment1 PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment2 STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment2 PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testJoinLeaderCannotAssign STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testJoinLeaderCannotAssign PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testRejoinGroup STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testRejoinGroup PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupLeader STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupLeader PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartUnknownTask STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartUnknownTask PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorConfigAdded STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorConfigAdded PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTaskRedirectToLeader STARTED


[GitHub] kafka pull request #1511: KAFKA-XXXX: WorkerSinkTask commit prior to rebalan...

2016-06-15 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1511

KAFKA-: WorkerSinkTask commit prior to rebalance should be retried on 
wakeup



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka 
retry-commit-on-wakeup-in-sinks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1511.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1511


commit 0e2e4568325a52611f67e172688e2ade86e8bdf5
Author: Jason Gustafson 
Date:   2016-06-16T00:40:32Z

KAFKA-??: WorkerSinkTask commit prior to rebalance should be retried on 
wakeup




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1486: KAFKA-3753: Add approximateNumEntries() method to ...

2016-06-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1486


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


RE: Embedding zookeeper and kafka in java process.

2016-06-15 Thread Subhash Agrawal
Thanks for quick response.
I started zookeeper via zookeeper-server-start.bat and started kafka via my 
java process and I saw same error.
But if I start zookeeper via java process and start kafka via 
kafka-server-start.bat, t works fine.
It means it is not caused due to both getting started in same process. It must 
be some kafka specific issue.

Subhash Agrawal

-Original Message-
From: Guozhang Wang [mailto:wangg...@gmail.com] 
Sent: Wednesday, June 15, 2016 3:42 PM
To: dev@kafka.apache.org
Subject: Re: Embedding zookeeper and kafka in java process.

It seems "scala.MatchError: null" are not related to the settings that ZK
and Kafka is embedded in the same process, and the only case that I can
think of related is this: https://issues.apache.org/jira/browse/KAFKA-940.

Could you clarify if you start these two services on two processes, the
issue goes away?

Guozhang

On Wed, Jun 15, 2016 at 1:56 PM, Subhash Agrawal 
wrote:

> Hi All,
> I am embedding Kafka 0.10.0 and corresponding zookeeper in java process.
> In this process, I start zookeeper first and then wait for 10 seconds and
> then start kafka. These are all running in the same process. Toward the
> end of kafka startup, I see following exception. It seems zookeeper is not
> able
> to add the newly created kafka instance. Have you seen this error
> earlier?  I have only single node kafka.
>
> Let me know if you have any suggestions. I will really appreciate any help
> on this.
>
> Thanks
> Subhash Agrawal.
>
> [2016-06-15 13:39:39,616] INFO [Logserver_Starter] Registered broker 0 at
> path /brokers/ids/0 with addresses: PLAINTEXT ->
> EndPoint(localhost,8392,PLAINTEXT) (kafka.utils.ZkUtils)
> [2016-06-15 13:39:39,617] WARN [Logserver_Starter] No meta.properties file
> under dir C:\development \newkafka-logs\meta.properties
> (kafka.server.BrokerMetadataCheckpoint)
> [2016-06-15 13:39:39,627] INFO [ZkClient-EventThread-24-localhost:2181]
> New leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> [2016-06-15 13:39:39,629] INFO [ZkClient-EventThread-24-localhost:2181]
> [BrokerChangeListener on Controller 0]: Broker change listener fired for
> path /brokers/ids with children 0
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2016-06-15 13:39:39,638] INFO [Logserver_Starter] Kafka version :
> 0.10.0.0 (org.apache.kafka.common.utils.AppInfoParser)
> [2016-06-15 13:39:39,638] INFO [Logserver_Starter] Kafka commitId :
> b8642491e78c5a13 (org.apache.kafka.common.utils.AppInfoParser)
> [2016-06-15 13:39:39,639] INFO [Logserver_Starter] [Kafka Server 0],
> started (kafka.server.KafkaServer)
> [2016-06-15 13:39:39,806] INFO [ZkClient-EventThread-24-localhost:2181]
> [BrokerChangeListener on Controller 0]: Newly added brokers: 0, deleted
> brokers: , all live brokers: 0
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2016-06-15 13:39:39,808] DEBUG [ZkClient-EventThread-24-localhost:2181]
> [Channel manager on controller 0]: Controller 0 trying to connect to broker
> 0 (kafka.controller.ControllerChannelManager)
> [2016-06-15 13:39:39,818] ERROR [ZkClient-EventThread-24-localhost:2181]
> [BrokerChangeListener on Controller 0]: Error while handling broker changes
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> scala.MatchError: null
> at
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:122)
> at
> kafka.controller.ControllerChannelManager.addBroker(ControllerChannelManager.scala:74)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$4.apply(ReplicaStateMachine.scala:372)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$4.apply(ReplicaStateMachine.scala:372)
> at
> scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:372)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at
> 

[jira] [Commented] (KAFKA-3818) Change Mirror Maker default assignment strategy to round robin

2016-06-15 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332801#comment-15332801
 ] 

Vahid Hashemian commented on KAFKA-3818:


[~granthenke] I believe the problem you mentioned still exists in round robin 
assignor. Improving this assignor from the fairness point of view would very 
likely involve additional overhead (KIP-49 implementation is an example that 
brings fairness at the expense of a more complex solution). Going back to 
[~hachikuji]'s comment earlier, is this additional overhead accepted for a 
default assignment strategy?

> Change Mirror Maker default assignment strategy to round robin
> --
>
> Key: KAFKA-3818
> URL: https://issues.apache.org/jira/browse/KAFKA-3818
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>
> It might make more sense to use round robin assignment by default for MM 
> since it gives a better balance between the instances, in particular when the 
> number of MM instances exceeds the typical number of partitions per topic. 
> There doesn't seem to be any need to keep range assignment since 
> copartitioning is not an issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3818) Change Mirror Maker default assignment strategy to round robin

2016-06-15 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332798#comment-15332798
 ] 

Jason Gustafson edited comment on KAFKA-3818 at 6/15/16 11:04 PM:
--

[~granthenke] Good question. The round robin assignor used by the new consumer 
is a little different. It doesn't use the same hashing approach as the old one. 
It just takes all the partitions from all the subscribed topics and lays them 
out in order in an array (so that all partitions from the same topic are next 
to each other). It then round-robins through the consumers and assign 
partitions one by one. The consumers are sorted by the memberId given them by 
the coordinator, which is "\{clientId\}-\{uuid\}". If a consumer group is using 
the same clientId, that the ordering of the consumers should be random and this 
approach ought to give a good distribution for each topic, but it would be nice 
to verify whether this is actually true. In any case, even if the distribution 
is less than ideal, it might still be better than that provided by range, 
especially when you have a lot of topics with few partitions. 


was (Author: hachikuji):
[~granthenke] Good question. The round robin assignor used by the new consumer 
is a little different. It doesn't use the same hashing approach as the old one. 
It just takes all the partitions from all the subscribed topics and lays them 
out in order in an array (so that all partitions from the same topic are next 
to each other). It then round-robins through the consumers and assign 
partitions one by one. The consumers are sorted by the memberId given them by 
the coordinator, which is "{clientId}-{uuid}". If a consumer group is using the 
same clientId, that the ordering of the consumers should be random and this 
approach ought to give a good distribution for each topic, but it would be nice 
to verify whether this is actually true. In any case, even if the distribution 
is less than ideal, it might still be better than that provided by range, 
especially when you have a lot of topics with few partitions. 

> Change Mirror Maker default assignment strategy to round robin
> --
>
> Key: KAFKA-3818
> URL: https://issues.apache.org/jira/browse/KAFKA-3818
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>
> It might make more sense to use round robin assignment by default for MM 
> since it gives a better balance between the instances, in particular when the 
> number of MM instances exceeds the typical number of partitions per topic. 
> There doesn't seem to be any need to keep range assignment since 
> copartitioning is not an issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3818) Change Mirror Maker default assignment strategy to round robin

2016-06-15 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332798#comment-15332798
 ] 

Jason Gustafson commented on KAFKA-3818:


[~granthenke] Good question. The round robin assignor used by the new consumer 
is a little different. It doesn't use the same hashing approach as the old one. 
It just takes all the partitions from all the subscribed topics and lays them 
out in order in an array (so that all partitions from the same topic are next 
to each other). It then round-robins through the consumers and assign 
partitions one by one. The consumers are sorted by the memberId given them by 
the coordinator, which is "{clientId}-{uuid}". If a consumer group is using the 
same clientId, that the ordering of the consumers should be random and this 
approach ought to give a good distribution for each topic, but it would be nice 
to verify whether this is actually true. In any case, even if the distribution 
is less than ideal, it might still be better than that provided by range, 
especially when you have a lot of topics with few partitions. 

> Change Mirror Maker default assignment strategy to round robin
> --
>
> Key: KAFKA-3818
> URL: https://issues.apache.org/jira/browse/KAFKA-3818
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>
> It might make more sense to use round robin assignment by default for MM 
> since it gives a better balance between the instances, in particular when the 
> number of MM instances exceeds the typical number of partitions per topic. 
> There doesn't seem to be any need to keep range assignment since 
> copartitioning is not an issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Embedding zookeeper and kafka in java process.

2016-06-15 Thread Guozhang Wang
It seems "scala.MatchError: null" are not related to the settings that ZK
and Kafka is embedded in the same process, and the only case that I can
think of related is this: https://issues.apache.org/jira/browse/KAFKA-940.

Could you clarify if you start these two services on two processes, the
issue goes away?

Guozhang

On Wed, Jun 15, 2016 at 1:56 PM, Subhash Agrawal 
wrote:

> Hi All,
> I am embedding Kafka 0.10.0 and corresponding zookeeper in java process.
> In this process, I start zookeeper first and then wait for 10 seconds and
> then start kafka. These are all running in the same process. Toward the
> end of kafka startup, I see following exception. It seems zookeeper is not
> able
> to add the newly created kafka instance. Have you seen this error
> earlier?  I have only single node kafka.
>
> Let me know if you have any suggestions. I will really appreciate any help
> on this.
>
> Thanks
> Subhash Agrawal.
>
> [2016-06-15 13:39:39,616] INFO [Logserver_Starter] Registered broker 0 at
> path /brokers/ids/0 with addresses: PLAINTEXT ->
> EndPoint(localhost,8392,PLAINTEXT) (kafka.utils.ZkUtils)
> [2016-06-15 13:39:39,617] WARN [Logserver_Starter] No meta.properties file
> under dir C:\development \newkafka-logs\meta.properties
> (kafka.server.BrokerMetadataCheckpoint)
> [2016-06-15 13:39:39,627] INFO [ZkClient-EventThread-24-localhost:2181]
> New leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> [2016-06-15 13:39:39,629] INFO [ZkClient-EventThread-24-localhost:2181]
> [BrokerChangeListener on Controller 0]: Broker change listener fired for
> path /brokers/ids with children 0
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2016-06-15 13:39:39,638] INFO [Logserver_Starter] Kafka version :
> 0.10.0.0 (org.apache.kafka.common.utils.AppInfoParser)
> [2016-06-15 13:39:39,638] INFO [Logserver_Starter] Kafka commitId :
> b8642491e78c5a13 (org.apache.kafka.common.utils.AppInfoParser)
> [2016-06-15 13:39:39,639] INFO [Logserver_Starter] [Kafka Server 0],
> started (kafka.server.KafkaServer)
> [2016-06-15 13:39:39,806] INFO [ZkClient-EventThread-24-localhost:2181]
> [BrokerChangeListener on Controller 0]: Newly added brokers: 0, deleted
> brokers: , all live brokers: 0
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2016-06-15 13:39:39,808] DEBUG [ZkClient-EventThread-24-localhost:2181]
> [Channel manager on controller 0]: Controller 0 trying to connect to broker
> 0 (kafka.controller.ControllerChannelManager)
> [2016-06-15 13:39:39,818] ERROR [ZkClient-EventThread-24-localhost:2181]
> [BrokerChangeListener on Controller 0]: Error while handling broker changes
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> scala.MatchError: null
> at
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:122)
> at
> kafka.controller.ControllerChannelManager.addBroker(ControllerChannelManager.scala:74)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$4.apply(ReplicaStateMachine.scala:372)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$4.apply(ReplicaStateMachine.scala:372)
> at
> scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:372)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
> at
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
> at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:843)
> at
> org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>
>


-- 
-- Guozhang


Re: Embedding zookeeper and kafka in java process.

2016-06-15 Thread Flavio Junqueira

> On 15 Jun 2016, at 21:56, Subhash Agrawal  wrote:
> 
> [2016-06-15 13:39:39,808] DEBUG [ZkClient-EventThread-24-localhost:2181] 
> [Channel manager on controller 0]: Controller 0 trying to connect to broker 0 
> (kafka.controller.ControllerChannelManager)


The controller isn't being able to connect to itself (as broker)? It looks like 
ZK is triggering the event just fine, but the controller is having some trouble 
seeing itself as a broker.

-Flavio

Re: Memory consumption of Kafka-examples Kafka-streams around 1.5 GB

2016-06-15 Thread Philippe Derome
Guozhang,

No two in particular, at first it was simply the last two that the target
would choose: SumLambdaIntegrationTest and WordCountLambdaIntegrationTest .
I tried another couple to exclude and it was fine as well. There's one
Scala test that is included in the run and I run it as well as 7/9 of the
Java ones.

I am not convinced by what you say because after 2-3 tests, I'd think ZK
and Kafka should be loaded in and I think anecdotally there's about 700MB
loaded, so there's quite a bit loaded when running the next few tests, at
least as far as I can see.

The main thing for me is to be reassured that the order of magnitude of
memory consumption looks right to you. I find it a bit on the high side but
I won't argue that. So, all in all, I am satisfied with your answer.

On Wed, Jun 15, 2016 at 6:25 PM, Guozhang Wang  wrote:

> Hello Phillippe,
>
> I used to run the "SimpleBenchmark" on my laptop with 4GB also, and it
> usually used close to, but less than 1GB.
>
>
> https://www.codatlas.com/github.com/apache/kafka/HEAD/streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java
>
> Note that I need to bootstrap a real ZK instance and a Kafka instance in
> order to run that benchmark, and I think those two instances are actually
> taking the major memory usage than Kafka Streams instance itself.
>
> There may be some extra memory overhead from maven framework but I would be
> surprised if that is taking large amount.
>
> Which two test cases specifically are causing OOMs on your laptop?
>
> Guozhang
>
>
> On Tue, Jun 14, 2016 at 4:50 PM, Philippe Derome 
> wrote:
>
> > I am running "mvn test" as per tip from
> > https://github.com/confluentinc/examples/tree/master/kafka-streams
> > README.MD.
> > This uses embedded Kafka components from test alone (no ZK, Kafka,
> > schema-registry running).
> >
> > I monitor on OSX El Capitan (10.11.5) memory usage and it grows on Java
> > processes from nothing to about 1.3GB when it fails to execute last 2
> tests
> > with Java out of memory exceptions. Selecting 2 tests to avoid makes the
> > test pass but with my 4GB system, I cannot pass them all.
> >
> > Is that relatively large memory consumption to be expected on these test
> > cases?
> >
> > I'd like to run stand-alone from jar and I'll be able to do so by
> excluding
> > test cases.
> >
> > Fyi, on mailing list I see only 1 issue related to Streams and memory if
> > that's any relevant (KAFKA-3738).
> >
> > Phil
> >
>
>
>
> --
> -- Guozhang
>


[jira] [Commented] (KAFKA-3536) ReplicaFetcherThread should not log errors when leadership changes

2016-06-15 Thread Jason Rosenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332767#comment-15332767
 ] 

Jason Rosenberg commented on KAFKA-3536:


which version are you seeing this in (I'm seeing it just now in 0.8.2.2), after 
a server restart

> ReplicaFetcherThread should not log errors when leadership changes
> --
>
> Key: KAFKA-3536
> URL: https://issues.apache.org/jira/browse/KAFKA-3536
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Stig Rohde Døssing
>Priority: Minor
>
> When there is a leadership change, ReplicaFetcherThread will spam the log 
> with errors similar to the log snippet below.
> {noformat}
> [ReplicaFetcherThread-0-2], Error for partition [ticketupdate,7] to broker 
> 2:class kafka.common.NotLeaderForPartitionException 
> (kafka.server.ReplicaFetcherThread)
> {noformat}
> ReplicaFetcherThread/AbstractFetcherThread should log those exceptions at a 
> lower log level, since they don't actually indicate an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Complexe Event Processing on top of KafkaStreams

2016-06-15 Thread Guozhang Wang
Hello Florian,

Thanks for your interests! As mentioned in our release notes we are
considering to add the SQL support (e.g. using Calcite) on top of Kafka
Streams as a near-term future work.

My own experience with CEP originates completely from a research project
called "Cayuga" from my grad school:
http://www.cs.cornell.edu/bigreddata/cayuga/, but from that limited
experience I personally think that may be naturally fit for CEP use cases.

Let me know what do you think.


Guozhang


On Wed, Jun 15, 2016 at 3:59 AM, Florian Hussonnois 
wrote:

> Hi Team Kafka,
>
> Currently, I'm working on an small library to implement "complex event
> processing" on top of Kafka Streams :
> https://github.com/fhussonnois/kafkastreams-cep
>
> The idea came from the flink-cep library and the project is based on the
> same research paper.
>
> I'm developping this project for fun. But I'm not expert about CEP so maybe
> I'm doing things wrong ^^
>
> I would like to share with you my work because I think this could interest
> the kafka community.
> The project is still in progress but things seem to be on a right way.
>
> I've already planned to add a support for KStream DSL.
>
> Also, I would like to know if you will plan to add an "external" module in
> order to add contributions without impacting the kafkastreams APIs?
>
> Please feel free to give me your feedback about my API.
>
> Thanks for you time and the amazing work you are doing on Kafka.
>
> Florian.
>
> --
> Florian HUSSONNOIS
> @fhussonnois
>



-- 
-- Guozhang


Re: Memory consumption of Kafka-examples Kafka-streams around 1.5 GB

2016-06-15 Thread Guozhang Wang
Hello Phillippe,

I used to run the "SimpleBenchmark" on my laptop with 4GB also, and it
usually used close to, but less than 1GB.

https://www.codatlas.com/github.com/apache/kafka/HEAD/streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java

Note that I need to bootstrap a real ZK instance and a Kafka instance in
order to run that benchmark, and I think those two instances are actually
taking the major memory usage than Kafka Streams instance itself.

There may be some extra memory overhead from maven framework but I would be
surprised if that is taking large amount.

Which two test cases specifically are causing OOMs on your laptop?

Guozhang


On Tue, Jun 14, 2016 at 4:50 PM, Philippe Derome  wrote:

> I am running "mvn test" as per tip from
> https://github.com/confluentinc/examples/tree/master/kafka-streams
> README.MD.
> This uses embedded Kafka components from test alone (no ZK, Kafka,
> schema-registry running).
>
> I monitor on OSX El Capitan (10.11.5) memory usage and it grows on Java
> processes from nothing to about 1.3GB when it fails to execute last 2 tests
> with Java out of memory exceptions. Selecting 2 tests to avoid makes the
> test pass but with my 4GB system, I cannot pass them all.
>
> Is that relatively large memory consumption to be expected on these test
> cases?
>
> I'd like to run stand-alone from jar and I'll be able to do so by excluding
> test cases.
>
> Fyi, on mailing list I see only 1 issue related to Streams and memory if
> that's any relevant (KAFKA-3738).
>
> Phil
>



-- 
-- Guozhang


[jira] [Commented] (KAFKA-3822) Kafka Consumer close() hangs indefinitely if Kafka Broker shutdown while connected

2016-06-15 Thread Alexander Cook (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332736#comment-15332736
 ] 

Alexander Cook commented on KAFKA-3822:
---

I got to try this out today, and you are correct. This only happens when 
enable.auto.commit=true. max.block.ms would be great. Would that cover 
consumer.poll as well? 

> Kafka Consumer close() hangs indefinitely if Kafka Broker shutdown while 
> connected
> --
>
> Key: KAFKA-3822
> URL: https://issues.apache.org/jira/browse/KAFKA-3822
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1, 0.10.0.0
> Environment: x86 Red Hat 6 (1 broker running zookeeper locally, 
> client running on a separate server)
>Reporter: Alexander Cook
>
> I am using the KafkaConsumer java client to consume messages. My application 
> shuts down smoothly if I am connected to a Kafka broker, or if I never 
> succeed at connecting to a Kafka broker, but if the broker is shut down while 
> my consumer is connected to it, consumer.close() hangs indefinitely. 
> Here is how I reproduce it: 
> 1. Start 0.9.0.1 Kafka Broker
> 2. Start consumer application and consume messages
> 3. Stop 0.9.0.1 Kafka Broker (ctrl-c or stop script)
> 4. Try to stop application...hangs at consumer.close() indefinitely. 
> I also see this same behavior using 0.10 broker and client. 
> This is my first bug reported to Kafka, so please let me know if I should be 
> following a different format. Thanks! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3849) Make consumer poll time in MirrorMaker configurable.

2016-06-15 Thread Ashish K Singh (JIRA)
Ashish K Singh created KAFKA-3849:
-

 Summary: Make consumer poll time in MirrorMaker configurable.
 Key: KAFKA-3849
 URL: https://issues.apache.org/jira/browse/KAFKA-3849
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.10.0.0
Reporter: Ashish K Singh
Assignee: Ashish K Singh
 Fix For: 0.10.0.1


MirrorMaker has consumer poll time, consumer timeout time for new consumer, 
hard coded at 1000 ms. This should be configurable as it is in case of old 
consumer. Default can stay as 1000 ms though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3693) Race condition between highwatermark-checkpoint thread and handleLeaderAndIsrRequest at broker start-up

2016-06-15 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332684#comment-15332684
 ] 

Jun Rao commented on KAFKA-3693:


[~maysamyabandeh], that by itself may not be a bad idea. We will have to think 
through UpdateMetadataRequest as well since currently, we expect there is an 
UpdateMetadataRequest before the first LeaderAndIsrRequest on broker startup.

> Race condition between highwatermark-checkpoint thread and 
> handleLeaderAndIsrRequest at broker start-up
> ---
>
> Key: KAFKA-3693
> URL: https://issues.apache.org/jira/browse/KAFKA-3693
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Maysam Yabandeh
>
> Upon broker start-up, a race between highwatermark-checkpoint thread to write 
> replication-offset-checkpoint file and handleLeaderAndIsrRequest thread 
> reading from it causes the highwatermark for some partitions to be reset to 
> 0. In the good case, this results the replica to truncate its entire log to 0 
> and hence initiates fetching of terabytes of data from the lead broker, which 
> sometimes leads to hours of downtime. We observed the bad cases that the 
> reset offset can propagate to recovery-point-offset-checkpoint file, making a 
> lead broker to truncate the file. This seems to have the potential to lead to 
> data loss if the truncation happens at both follower and leader brokers.
> This is the particular faulty scenario manifested in our tests:
> # The broker restarts and receive LeaderAndIsr from the controller
> # LeaderAndIsr message however does not contain all the partitions (probably 
> because other brokers were churning at the same time)
> # becomeLeaderOrFollower calls getOrCreatePartition and updates the 
> allPartitions with the partitions included in the LeaderAndIsr message {code}
>   def getOrCreatePartition(topic: String, partitionId: Int): Partition = {
> var partition = allPartitions.get((topic, partitionId))
> if (partition == null) {
>   allPartitions.putIfNotExists((topic, partitionId), new Partition(topic, 
> partitionId, time, this))
> {code}
> # replication-offset-checkpoint jumps in taking a snapshot of (the partial) 
> allReplicas' high watermark into replication-offset-checkpoint file {code}  
> def checkpointHighWatermarks() {
> val replicas = 
> allPartitions.values.map(_.getReplica(config.brokerId)).collect{case 
> Some(replica) => replica}{code} hence rewriting the previous highwatermarks.
> # Later becomeLeaderOrFollower calls makeLeaders and makeFollowers which read 
> the (now partial) file through Partition::getOrCreateReplica {code}
>   val checkpoint = 
> replicaManager.highWatermarkCheckpoints(log.dir.getParentFile.getAbsolutePath)
>   val offsetMap = checkpoint.read
>   if (!offsetMap.contains(TopicAndPartition(topic, partitionId)))
> info("No checkpointed highwatermark is found for partition 
> [%s,%d]".format(topic, partitionId))
> {code}
> We are not entirely sure whether the initial LeaderAndIsr message including a 
> subset of partitions is critical in making this race condition manifest or 
> not. But it is an important detail since it clarifies that a solution based 
> on not letting the highwatermark-checkpoint thread jumping in the middle of 
> processing a LeaderAndIsr message would not suffice.
> The solution we are thinking of is to force initializing allPartitions by the 
> partitions listed in the replication-offset-checkpoint (and perhaps 
> recovery-point-offset-checkpoint file too) when a server starts.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3059) ConsumerGroupCommand should allow resetting offsets for consumer groups

2016-06-15 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332660#comment-15332660
 ] 

Guozhang Wang commented on KAFKA-3059:
--

Chiming in here since Streams may need this feature as well for cleaning up (cc 
[~mjsax]): I think "all consumers in the group is shutdown" is a reasonable 
request before using this admin command. And in addition there are cases where 
users only use Kafka for storing offsets but not for group management; so the 
validation on the server side would be: "if there is no group registry 
information, or else if the group's member list is empty, proceed".

For overriding offsets I agree that the current OffsetCommit Request with 
generation id equals to -1 would work. As for DeleteGroup v.s. DeleteOffsets 
for deleting offsets, personally I feel the latter may be necessary since there 
are cases when users only want to delete some specific offset (but not all 
offsets), and delete-group will not naturally support that. In addition, if 
users really want to delete the whole group before waiting it to be auto purged 
(with default purging frequency of 5 minutes today) they can still use 
delete-offsets on all offsets.

One orthogonal point is that today we do not enforce "groupId" as required 
config and default value is empty string, so that if two consumers started with 
manual assignment but both forgot to set the group id their overrides will 
interfere each other; one possible solution maybe to use "threadId" instead of 
empty string as defaults, since for cases where users do not care to specify a 
group id, they are usually just one instance starting on a single machine.

> ConsumerGroupCommand should allow resetting offsets for consumer groups
> ---
>
> Key: KAFKA-3059
> URL: https://issues.apache.org/jira/browse/KAFKA-3059
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Jason Gustafson
>
> As discussed here:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201601.mbox/%3CCA%2BndhHpf3ib%3Ddsh9zvtfVjRiUjSz%2B%3D8umXm4myW%2BpBsbTYATAQ%40mail.gmail.com%3E
> * Given a consumer group, remove all stored offsets
> * Given a group and a topic, remove offset for group  and topic
> * Given a group, topic, partition and offset - set the offset for the 
> specified partition and group with the given value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk8 #697

2016-06-15 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-3059) ConsumerGroupCommand should allow resetting offsets for consumer groups

2016-06-15 Thread Henry Cai (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332649#comment-15332649
 ] 

Henry Cai commented on KAFKA-3059:
--

We would like to have the capability to move the offset to a certain time in 
the past.  Doesn't have to be a precise point, move to the offset a little 
before that past time is fine, I think the ListOffsetRequest gives that 
capability.

> ConsumerGroupCommand should allow resetting offsets for consumer groups
> ---
>
> Key: KAFKA-3059
> URL: https://issues.apache.org/jira/browse/KAFKA-3059
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Jason Gustafson
>
> As discussed here:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201601.mbox/%3CCA%2BndhHpf3ib%3Ddsh9zvtfVjRiUjSz%2B%3D8umXm4myW%2BpBsbTYATAQ%40mail.gmail.com%3E
> * Given a consumer group, remove all stored offsets
> * Given a group and a topic, remove offset for group  and topic
> * Given a group, topic, partition and offset - set the offset for the 
> specified partition and group with the given value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-55: Secure quotas for authenticated users

2016-06-15 Thread Jun Rao
Hi, Rajini,

Thanks for the updated wiki. Overall, I like the new approach. It covers
the common use cases now, is extensible, and is backward compatible. A few
comments below.

1. It would be useful to describe a bit at the high level, how the new
approach works. I think this can be summarized as follows. Quotas can be
set at user, client-id or  levels. For a given client
connection, the most specific quota matching the connection will be
applied. For example, if both a  and a  quota match
a connection, the  quota will be used. If more than 1
quota at the same level (e.g., a quota on a user and another quota on a
client-id) match the connection, the user level quota will be used, i.e.,
user quota takes precedence over client-id quota.

2. For the ZK data structure, would it be better to store 
quota as the following. Then the format of the value in each path is the
same. The wiki also mentions that we want to include the original user name
in the ZK value. Could you describe that in an example?
// Zookeeper persistence path /clients/clientA/users/
{
"version":1,
"config": {
"producer_byte_rate":"10",
"consumer_byte_rate":"20"
}
}

3. Could you document the format change of the ZK value in
/config/changes/config_change_xxx, if any?

4. For the config command, could we specify the sub-quota like the
following, instead of in the config value? This seems more intuitive.

bin/kafka-configs  --zookeeper localhost:2181 --alter --add-config
'producer_byte_rate=1024,consumer_byte_rate=2048' --entity-name
clientA --entity-type clients --entity-name user2 --entity-type users

Thanks,

Jun

On Wed, Jun 15, 2016 at 10:35 AM, Rajini Sivaram <
rajinisiva...@googlemail.com> wrote:

> Harsha,
>
> The sample configuration under
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users#KIP-55:SecureQuotasforAuthenticatedUsers-QuotaConfiguration
> shows
> the Zookeeper data for different scenarios.
>
>- *user1* (/users/user1 in ZK) has only user-level quotas
>- *user2* (/users/user2 in ZK) defines user-level quotas and sub-quotas
>for clients *clientA* and *clientB*. Other client-ids of *user2* share
>the remaining quota of *user2*.
>
>
> On Wed, Jun 15, 2016 at 5:30 PM, Harsha  wrote:
>
> > Rajini,
> >   How does sub-quotas works in case of authenticated users.
> >   Where are we maintaining the relation between users and their
> >   client Ids. Can you add an example of zk data under /users.
> > Thanks,
> > Harsha
> >
> > On Mon, Jun 13, 2016, at 05:01 AM, Rajini Sivaram wrote:
> > > I have updated KIP-55 to reflect the changes from the discussions in
> the
> > > voting thread (
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg51610.html).
> > >
> > > Jun/Gwen,
> > >
> > > Existing client-id quotas will be used as default client-id quotas for
> > > users when no user quotas are configured - i.e., default user quota is
> > > unlimited and no user-specific quota override is specified. This
> enables
> > > user rate limits to be configured for ANONYMOUS if required in a
> cluster
> > > that has both PLAINTEXT and SSL/SASL. By default, without any user rate
> > > limits set, rate limits for client-ids will apply, retaining the
> current
> > > client-id quota configuration for single-user clusters.
> > >
> > > Zookeeper will have two paths /clients with client-id quotas that apply
> > > only when user quota is unlimited similar to now. And /users which
> > > persists
> > > user quotas for any user including ANONYMOUS.
> > >
> > > Comments and feedback are appreciated.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Wed, Jun 8, 2016 at 9:00 PM, Rajini Sivaram
> > >  > > > wrote:
> > >
> > > > Jun,
> > > >
> > > > Oops, sorry, I hadn't realized that the last note was on the discuss
> > > > thread. Thank you for pointing it out. I have sent another note for
> > voting.
> > > >
> > > >
> > > > On Wed, Jun 8, 2016 at 4:30 PM, Jun Rao  wrote:
> > > >
> > > >> Rajini,
> > > >>
> > > >> Perhaps it will be clearer if you start the voting in a new thread
> > (with
> > > >> VOTE in the subject).
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jun
> > > >>
> > > >> On Tue, Jun 7, 2016 at 1:55 PM, Rajini Sivaram <
> > > >> rajinisiva...@googlemail.com
> > > >> > wrote:
> > > >>
> > > >> > I would like to initiate the vote for KIP-55.
> > > >> >
> > > >> > The KIP details are here: KIP-55: Secure quotas for authenticated
> > users
> > > >> > <
> > > >> >
> > > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users
> > > >> > >
> > > >> > .
> > > >> >
> > > >> > The JIRA  KAFKA-3492  <
> > https://issues.apache.org/jira/browse/KAFKA-3492
> > > >> > >has
> > > >> > a draft PR here: https://github.com/apache/kafka/pull/1256.
> > > 

[jira] [Commented] (KAFKA-3837) Report the name of the blocking thread when throwing ConcurrentModificationException

2016-06-15 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332632#comment-15332632
 ] 

Bharat Viswanadham commented on KAFKA-3837:
---

Hi Simon,
I will update the code to print thread name. You can have a look in to the 
discussuion happened in the code review of the pull request for more 
information.

> Report the name of the blocking thread when throwing 
> ConcurrentModificationException
> 
>
> Key: KAFKA-3837
> URL: https://issues.apache.org/jira/browse/KAFKA-3837
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Simon Cooper
>Assignee: Bharat Viswanadham
>Priority: Minor
>
> {{KafkaConsumer.acquire}} throws {{ConcurrentModificationException}} if the 
> current thread it does not match the {{currentThread}} field. It would be 
> useful if the name of the other thread was included in the exception message 
> to help debug the problem when this exception occurs.
> As it stands, it can be really difficult to work out what's going wrong when 
> there's several threads all accessing the consumer at the same time, and your 
> existing exclusive access logic doesn't seem to be working as it should.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3837) Report the name of the blocking thread when throwing ConcurrentModificationException

2016-06-15 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated KAFKA-3837:
--
Status: Open  (was: Patch Available)

> Report the name of the blocking thread when throwing 
> ConcurrentModificationException
> 
>
> Key: KAFKA-3837
> URL: https://issues.apache.org/jira/browse/KAFKA-3837
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Simon Cooper
>Assignee: Bharat Viswanadham
>Priority: Minor
>
> {{KafkaConsumer.acquire}} throws {{ConcurrentModificationException}} if the 
> current thread it does not match the {{currentThread}} field. It would be 
> useful if the name of the other thread was included in the exception message 
> to help debug the problem when this exception occurs.
> As it stands, it can be really difficult to work out what's going wrong when 
> there's several threads all accessing the consumer at the same time, and your 
> existing exclusive access logic doesn't seem to be working as it should.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3837) Report the name of the blocking thread when throwing ConcurrentModificationException

2016-06-15 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3837 started by Bharat Viswanadham.
-
> Report the name of the blocking thread when throwing 
> ConcurrentModificationException
> 
>
> Key: KAFKA-3837
> URL: https://issues.apache.org/jira/browse/KAFKA-3837
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Simon Cooper
>Assignee: Bharat Viswanadham
>Priority: Minor
>
> {{KafkaConsumer.acquire}} throws {{ConcurrentModificationException}} if the 
> current thread it does not match the {{currentThread}} field. It would be 
> useful if the name of the other thread was included in the exception message 
> to help debug the problem when this exception occurs.
> As it stands, it can be really difficult to work out what's going wrong when 
> there's several threads all accessing the consumer at the same time, and your 
> existing exclusive access logic doesn't seem to be working as it should.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3693) Race condition between highwatermark-checkpoint thread and handleLeaderAndIsrRequest at broker start-up

2016-06-15 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332618#comment-15332618
 ] 

Maysam Yabandeh commented on KAFKA-3693:


Thinking more about this, the complexity seems to stem from the controller not 
explicitly distinguishing between the first LeaderAndIsrRequest, which has a 
very special meaning, from the others. Looking back at the 
[design|https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3],
 there was a *isInit* field envisioned in the LeaderAndIsrRequest to serve this 
purpose:
{code}
LeaderAndISRRequest {
  request_type_id : int16 // the request id
  version_id  : int16 // the version of this request
  client_id   : int32 // this can be the broker id of the controller
  ack_timeout : int32 // the time in ms to wait for a response
  isInit  : byte  // whether this is the first command issued 
by a controller
  leaderAndISRMap : Map[(topic: String, partitionId: int32) => 
LeaderAndISR) // a map of LeaderAndISR
}
{code} however i do not seem to be able to track this field in the 
[trunk|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/protocol/Protocol.java]:
{code}
public static final Schema LEADER_AND_ISR_REQUEST_V0 = new Schema(new 
Field("controller_id", INT32, "The controller id."),
  new 
Field("controller_epoch", INT32, "The controller epoch."),
  new 
Field("partition_states",

new ArrayOf(LEADER_AND_ISR_REQUEST_PARTITION_STATE_V0)),
  new 
Field("live_leaders", new ArrayOf(LEADER_AND_ISR_REQUEST_LIVE_LEADER_V0)));
public static final Schema[] LEADER_AND_ISR_REQUEST = new Schema[] 
{LEADER_AND_ISR_REQUEST_V0};
{code}

Not sure why the *isInit* field did not make it from the design to the 
implementation but it seems that it would be pretty straightforward to make the 
brokers defensive against such corner cases if this field is available, i.e., 
reject the first LeaderAndIsrRequest message if it does not have isInit set.

> Race condition between highwatermark-checkpoint thread and 
> handleLeaderAndIsrRequest at broker start-up
> ---
>
> Key: KAFKA-3693
> URL: https://issues.apache.org/jira/browse/KAFKA-3693
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Maysam Yabandeh
>
> Upon broker start-up, a race between highwatermark-checkpoint thread to write 
> replication-offset-checkpoint file and handleLeaderAndIsrRequest thread 
> reading from it causes the highwatermark for some partitions to be reset to 
> 0. In the good case, this results the replica to truncate its entire log to 0 
> and hence initiates fetching of terabytes of data from the lead broker, which 
> sometimes leads to hours of downtime. We observed the bad cases that the 
> reset offset can propagate to recovery-point-offset-checkpoint file, making a 
> lead broker to truncate the file. This seems to have the potential to lead to 
> data loss if the truncation happens at both follower and leader brokers.
> This is the particular faulty scenario manifested in our tests:
> # The broker restarts and receive LeaderAndIsr from the controller
> # LeaderAndIsr message however does not contain all the partitions (probably 
> because other brokers were churning at the same time)
> # becomeLeaderOrFollower calls getOrCreatePartition and updates the 
> allPartitions with the partitions included in the LeaderAndIsr message {code}
>   def getOrCreatePartition(topic: String, partitionId: Int): Partition = {
> var partition = allPartitions.get((topic, partitionId))
> if (partition == null) {
>   allPartitions.putIfNotExists((topic, partitionId), new Partition(topic, 
> partitionId, time, this))
> {code}
> # replication-offset-checkpoint jumps in taking a snapshot of (the partial) 
> allReplicas' high watermark into replication-offset-checkpoint file {code}  
> def checkpointHighWatermarks() {
> val replicas = 
> allPartitions.values.map(_.getReplica(config.brokerId)).collect{case 
> Some(replica) => replica}{code} hence rewriting the previous highwatermarks.
> # Later becomeLeaderOrFollower calls makeLeaders and makeFollowers which read 
> the (now partial) file through Partition::getOrCreateReplica {code}
>   val checkpoint = 
> replicaManager.highWatermarkCheckpoints(log.dir.getParentFile.getAbsolutePath)
>   val offsetMap = checkpoint.read
>

[jira] [Commented] (KAFKA-3848) Mirror Maker sometimes fails to mirror first batch of messages

2016-06-15 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332583#comment-15332583
 ] 

James Cheng commented on KAFKA-3848:


Better documentation might help. Unfortunately, setting {{auto.offset.reset}} 
to {{smallest}} could have some other undesirable effects. See 
https://issues.apache.org/jira/browse/KAFKA-3370?focusedCommentId=15299148=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15299148
 

> Mirror Maker sometimes fails to mirror first batch of messages
> --
>
> Key: KAFKA-3848
> URL: https://issues.apache.org/jira/browse/KAFKA-3848
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: James Clarke
>
> I am seeing an intermittent issue in Mirror Maker where the first batch of 
> messages are not always mirrored to the target cluster. All messages after 
> the first batch are mirrored.
> I have a github repo 
> ([jc/kafka-mirror-maker-test|https://github.com/jc/kafka-mirror-maker-test]) 
> which reproduces the issue using Confluent's docker containers (running 
> 0.10.0). However on our environment we are using our own kafka containers 
> running 0.9.0.1.
> Environment:
> - edge datacenter dc1. 1 zk server, 1 kafka server.
> - aggregate datacenter dc2. 1 zk server, 1 kafka server, 1 mirror maker.
> - kafka server setup to auto create topics
> - Mirrror maker configured mirror from dc1 to dc2 using a whitelist 
> containing both explicitly topics and regex topics.
> Steps to reproduce:
> - Send message to a non-existent topic in dc1.
> - Send a second message to topic in dc1.
> Observed:
> - After first message the topic is not created in dc2.
> - After second message topic is present in dc2.
> - Consuming from topic in dc1 shows both messages.
> - Consuming from topic in dc2 shows only the second message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-15 Thread Grant Henke
Turns out we already have an InvalidRequestException:
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L75-L98

We just don't map it in Errors.java so it results in an UNKNOWN error when
sent back to the client.

I will migrate the InvalidRequestException to the client package, add it to
Errors and use that to signify any protocol parsing/rule errors.



On Wed, Jun 15, 2016 at 2:55 PM, Dana Powers  wrote:

> On Wed, Jun 15, 2016 at 12:19 PM, Ismael Juma  wrote:
> > Hi Grant,
> >
> > Comments below.
> >
> > On Wed, Jun 15, 2016 at 6:52 PM, Grant Henke 
> wrote:
> >>
> >> The one thing I want to avoid is to many super specific error codes. I
> am
> >> not sure how much of a problem it really is but in the case of wire
> >> protocol errors like multiple instances of the same topic, do you have
> any
> >> thoughts on the error? Should we make a generic InvalidRequest error and
> >> log the detailed message on the broker for client authors to debug?
> >>
> >
> > That is a good question. It would be good to get input from client
> > developers like Dana on this.
>
> I think generic error codes are fine if the wire protocol requirements
> are documented [i.e., no duplicate topics and partitions/replicas are
> either/or not both]. If I get a broker error at the protocol level
> that I don't understand, the first place I look is the protocol docs.
> It may cause a few more emails to the mailing lists asking for
> clarification, but I think those will be easier to triage than
> confused emails like "I said create topic with 10 partitions, but I
> only got 5???"
>
> -Dana
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Embedding zookeeper and kafka in java process.

2016-06-15 Thread Subhash Agrawal
Hi All,
I am embedding Kafka 0.10.0 and corresponding zookeeper in java process. In 
this process, I start zookeeper first and then wait for 10 seconds and
then start kafka. These are all running in the same process. Toward the end of 
kafka startup, I see following exception. It seems zookeeper is not able
to add the newly created kafka instance. Have you seen this error earlier?  I 
have only single node kafka.

Let me know if you have any suggestions. I will really appreciate any help on 
this.

Thanks
Subhash Agrawal.

[2016-06-15 13:39:39,616] INFO [Logserver_Starter] Registered broker 0 at path 
/brokers/ids/0 with addresses: PLAINTEXT -> EndPoint(localhost,8392,PLAINTEXT) 
(kafka.utils.ZkUtils)
[2016-06-15 13:39:39,617] WARN [Logserver_Starter] No meta.properties file 
under dir C:\development \newkafka-logs\meta.properties 
(kafka.server.BrokerMetadataCheckpoint)
[2016-06-15 13:39:39,627] INFO [ZkClient-EventThread-24-localhost:2181] New 
leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2016-06-15 13:39:39,629] INFO [ZkClient-EventThread-24-localhost:2181] 
[BrokerChangeListener on Controller 0]: Broker change listener fired for path 
/brokers/ids with children 0 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2016-06-15 13:39:39,638] INFO [Logserver_Starter] Kafka version : 0.10.0.0 
(org.apache.kafka.common.utils.AppInfoParser)
[2016-06-15 13:39:39,638] INFO [Logserver_Starter] Kafka commitId : 
b8642491e78c5a13 (org.apache.kafka.common.utils.AppInfoParser)
[2016-06-15 13:39:39,639] INFO [Logserver_Starter] [Kafka Server 0], started 
(kafka.server.KafkaServer)
[2016-06-15 13:39:39,806] INFO [ZkClient-EventThread-24-localhost:2181] 
[BrokerChangeListener on Controller 0]: Newly added brokers: 0, deleted 
brokers: , all live brokers: 0 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2016-06-15 13:39:39,808] DEBUG [ZkClient-EventThread-24-localhost:2181] 
[Channel manager on controller 0]: Controller 0 trying to connect to broker 0 
(kafka.controller.ControllerChannelManager)
[2016-06-15 13:39:39,818] ERROR [ZkClient-EventThread-24-localhost:2181] 
[BrokerChangeListener on Controller 0]: Error while handling broker changes 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)
scala.MatchError: null
at 
kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:122)
at 
kafka.controller.ControllerChannelManager.addBroker(ControllerChannelManager.scala:74)
at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$4.apply(ReplicaStateMachine.scala:372)
at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$4.apply(ReplicaStateMachine.scala:372)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:372)
at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:843)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



[jira] [Commented] (KAFKA-3848) Mirror Maker sometimes fails to mirror first batch of messages

2016-06-15 Thread James Clarke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332534#comment-15332534
 ] 

James Clarke commented on KAFKA-3848:
-

Thanks for the quick rely. I think you might be right with 
{{auto.offset.reset}}. I'll perform some testing tonight.

I consumed from dc1 after running the full test and with {{--from-beginning}}. 
This is not isolated to wildcard topics because I produced directly to 10 
topics which were explicitly listed in the whitelist. I think it is just an 
edge case with newly created topics (via {{auto.create.topics.enable}}).

Assuming the issue with {{auto.offset.reset}} default behavior, and the 
consumer is detecting the new topic after the first message is produced (and 
therefore consuming from offset 1 onwards) then I suggest the documentation be 
updated to point out that {{auto.offset.reset}} should be set to {{smallest}}. 
Currently:

{quote}
Combining mirroring with the configuration auto.create.topics.enable=true makes 
it possible to have a replica cluster that will automatically create and 
replicate all data in a source cluster even as new topics are added.
{quote}

> Mirror Maker sometimes fails to mirror first batch of messages
> --
>
> Key: KAFKA-3848
> URL: https://issues.apache.org/jira/browse/KAFKA-3848
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: James Clarke
>
> I am seeing an intermittent issue in Mirror Maker where the first batch of 
> messages are not always mirrored to the target cluster. All messages after 
> the first batch are mirrored.
> I have a github repo 
> ([jc/kafka-mirror-maker-test|https://github.com/jc/kafka-mirror-maker-test]) 
> which reproduces the issue using Confluent's docker containers (running 
> 0.10.0). However on our environment we are using our own kafka containers 
> running 0.9.0.1.
> Environment:
> - edge datacenter dc1. 1 zk server, 1 kafka server.
> - aggregate datacenter dc2. 1 zk server, 1 kafka server, 1 mirror maker.
> - kafka server setup to auto create topics
> - Mirrror maker configured mirror from dc1 to dc2 using a whitelist 
> containing both explicitly topics and regex topics.
> Steps to reproduce:
> - Send message to a non-existent topic in dc1.
> - Send a second message to topic in dc1.
> Observed:
> - After first message the topic is not created in dc2.
> - After second message topic is present in dc2.
> - Consuming from topic in dc1 shows both messages.
> - Consuming from topic in dc2 shows only the second message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3848) Mirror Maker sometimes fails to mirror first batch of messages

2016-06-15 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332529#comment-15332529
 ] 

James Cheng commented on KAFKA-3848:


Somewhat related to https://issues.apache.org/jira/browse/KAFKA-3370

> Mirror Maker sometimes fails to mirror first batch of messages
> --
>
> Key: KAFKA-3848
> URL: https://issues.apache.org/jira/browse/KAFKA-3848
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: James Clarke
>
> I am seeing an intermittent issue in Mirror Maker where the first batch of 
> messages are not always mirrored to the target cluster. All messages after 
> the first batch are mirrored.
> I have a github repo 
> ([jc/kafka-mirror-maker-test|https://github.com/jc/kafka-mirror-maker-test]) 
> which reproduces the issue using Confluent's docker containers (running 
> 0.10.0). However on our environment we are using our own kafka containers 
> running 0.9.0.1.
> Environment:
> - edge datacenter dc1. 1 zk server, 1 kafka server.
> - aggregate datacenter dc2. 1 zk server, 1 kafka server, 1 mirror maker.
> - kafka server setup to auto create topics
> - Mirrror maker configured mirror from dc1 to dc2 using a whitelist 
> containing both explicitly topics and regex topics.
> Steps to reproduce:
> - Send message to a non-existent topic in dc1.
> - Send a second message to topic in dc1.
> Observed:
> - After first message the topic is not created in dc2.
> - After second message topic is present in dc2.
> - Consuming from topic in dc1 shows both messages.
> - Consuming from topic in dc2 shows only the second message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3818) Change Mirror Maker default assignment strategy to round robin

2016-06-15 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332524#comment-15332524
 ] 

Grant Henke commented on KAFKA-3818:


A few older threads mention that its possible to get clumping (due to the hash 
on the RoundRobinAssignor). Does that problem still exist? Is that something we 
should fix before changing the default?

This thread discusses it recently: 
http://search-hadoop.com/m/uyzND135BcA1lXiM=Re+DISCUSS+KIP+49+Fair+Partition+Assignment+Strategy
{quote}
 - WRT roundrobin we later realized a significant flaw in the way we lay
   out partitions: we originally wanted to randomize the partition layout to
   reduce the likelihood of most partitions of the same topic from ending up
   on a given consumer which is important if you have a few very large topics.
   Unfortunately we used hashCode - which does a splendid job of clumping
   partitions from the same topic together :( We can probably just "fix" that
   in the new consumer's roundrobin assignor.
{quote}

And this older jira looks to describe the issue [~jjkoshy] is referring to: 
KAFKA-2019

[~jjkoshy] do you have any thoughts?


> Change Mirror Maker default assignment strategy to round robin
> --
>
> Key: KAFKA-3818
> URL: https://issues.apache.org/jira/browse/KAFKA-3818
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>
> It might make more sense to use round robin assignment by default for MM 
> since it gives a better balance between the instances, in particular when the 
> number of MM instances exceeds the typical number of partitions per topic. 
> There doesn't seem to be any need to keep range assignment since 
> copartitioning is not an issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3848) Mirror Maker sometimes fails to mirror first batch of messages

2016-06-15 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332480#comment-15332480
 ] 

James Cheng commented on KAFKA-3848:


This behavior may be related to the value of auto.offset.reset. Mirrormaker 
uses the default value of auto.offset.reset, which is "largest" (aka, start 
receiving the newest messages in the stream)

I suspect what is happening is this:
1) mirrormaker has a wildcard listening to dc1
2) First message gets sent to dc1, topic gets auto-created
3) mirrormaker gets notified that a new topic is created (because of the 
wildcard)
4) mirrormaker starts consuming from the topic at "largest". It connects to the 
end of the topic and waits for new messages to come in.
5) second message gets sent to dc1
6) mirrormaker sees the 2nd message.

You said that when you consume from the topic in dc1, you see both messages. 
How were you listening to that topic? I suspect that if you consume from dc1 
using a wildcarded consumer, that you might see similar behavior to what you 
are seeing from mirrormaker. Your wildcarded consumer would see just the 2nd 
message.


> Mirror Maker sometimes fails to mirror first batch of messages
> --
>
> Key: KAFKA-3848
> URL: https://issues.apache.org/jira/browse/KAFKA-3848
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: James Clarke
>
> I am seeing an intermittent issue in Mirror Maker where the first batch of 
> messages are not always mirrored to the target cluster. All messages after 
> the first batch are mirrored.
> I have a github repo 
> ([jc/kafka-mirror-maker-test|https://github.com/jc/kafka-mirror-maker-test]) 
> which reproduces the issue using Confluent's docker containers (running 
> 0.10.0). However on our environment we are using our own kafka containers 
> running 0.9.0.1.
> Environment:
> - edge datacenter dc1. 1 zk server, 1 kafka server.
> - aggregate datacenter dc2. 1 zk server, 1 kafka server, 1 mirror maker.
> - kafka server setup to auto create topics
> - Mirrror maker configured mirror from dc1 to dc2 using a whitelist 
> containing both explicitly topics and regex topics.
> Steps to reproduce:
> - Send message to a non-existent topic in dc1.
> - Send a second message to topic in dc1.
> Observed:
> - After first message the topic is not created in dc2.
> - After second message topic is present in dc2.
> - Consuming from topic in dc1 shows both messages.
> - Consuming from topic in dc2 shows only the second message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk7 #1362

2016-06-15 Thread Apache Jenkins Server
See 



Build failed in Jenkins: kafka-trunk-jdk8 #696

2016-06-15 Thread Apache Jenkins Server
See 

Changes:

[harsha] KAFKA-3830; getTGT() debug logging exposes confidential information

--
[...truncated 12504 lines...]
org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByAlias STARTED

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByAlias PASSED

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByShortAlias 
STARTED

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByShortAlias 
PASSED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidConnector STARTED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidConnector PASSED

org.apache.kafka.connect.runtime.WorkerTest > testReconfigureConnectorTasks 
STARTED

org.apache.kafka.connect.runtime.WorkerTest > testReconfigureConnectorTasks 
PASSED

org.apache.kafka.connect.runtime.WorkerTest > testAddRemoveTask STARTED

org.apache.kafka.connect.runtime.WorkerTest > testAddRemoveTask PASSED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidTask STARTED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidTask PASSED

org.apache.kafka.connect.runtime.WorkerTest > testCleanupTasksOnStop STARTED

org.apache.kafka.connect.runtime.WorkerTest > testCleanupTasksOnStop PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testPollsInBackground STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testPollsInBackground PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommit STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommit PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitTaskFlushFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitTaskFlushFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitTaskSuccessAndFlushFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitTaskSuccessAndFlushFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitConsumerFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitConsumerFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitTimeout 
STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitTimeout 
PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testAssignmentPauseResume STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testAssignmentPauseResume PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testRewind STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testRewind PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupFollower STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupFollower PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testMetadata STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testMetadata PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment1 STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment1 PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment2 STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment2 PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testJoinLeaderCannotAssign STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testJoinLeaderCannotAssign PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testRejoinGroup STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testRejoinGroup PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupLeader STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupLeader PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalance STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalance PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testPutConnectorConfig STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testPutConnectorConfig PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinAssignment STARTED


Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-15 Thread Dana Powers
On Wed, Jun 15, 2016 at 12:19 PM, Ismael Juma  wrote:
> Hi Grant,
>
> Comments below.
>
> On Wed, Jun 15, 2016 at 6:52 PM, Grant Henke  wrote:
>>
>> The one thing I want to avoid is to many super specific error codes. I am
>> not sure how much of a problem it really is but in the case of wire
>> protocol errors like multiple instances of the same topic, do you have any
>> thoughts on the error? Should we make a generic InvalidRequest error and
>> log the detailed message on the broker for client authors to debug?
>>
>
> That is a good question. It would be good to get input from client
> developers like Dana on this.

I think generic error codes are fine if the wire protocol requirements
are documented [i.e., no duplicate topics and partitions/replicas are
either/or not both]. If I get a broker error at the protocol level
that I don't understand, the first place I look is the protocol docs.
It may cause a few more emails to the mailing lists asking for
clarification, but I think those will be easier to triage than
confused emails like "I said create topic with 10 partitions, but I
only got 5???"

-Dana


[jira] [Created] (KAFKA-3848) Mirror Maker sometimes fails to mirror first batch of messages

2016-06-15 Thread James Clarke (JIRA)
James Clarke created KAFKA-3848:
---

 Summary: Mirror Maker sometimes fails to mirror first batch of 
messages
 Key: KAFKA-3848
 URL: https://issues.apache.org/jira/browse/KAFKA-3848
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.0.0, 0.9.0.1
Reporter: James Clarke


I am seeing an intermittent issue in Mirror Maker where the first batch of 
messages are not always mirrored to the target cluster. All messages after the 
first batch are mirrored.

I have a github repo 
([jc/kafka-mirror-maker-test|https://github.com/jc/kafka-mirror-maker-test]) 
which reproduces the issue using Confluent's docker containers (running 
0.10.0). However on our environment we are using our own kafka containers 
running 0.9.0.1.

Environment:

- edge datacenter dc1. 1 zk server, 1 kafka server.
- aggregate datacenter dc2. 1 zk server, 1 kafka server, 1 mirror maker.
- kafka server setup to auto create topics
- Mirrror maker configured mirror from dc1 to dc2 using a whitelist containing 
both explicitly topics and regex topics.

Steps to reproduce:

- Send message to a non-existent topic in dc1.
- Send a second message to topic in dc1.

Observed:

- After first message the topic is not created in dc2.
- After second message topic is present in dc2.
- Consuming from topic in dc1 shows both messages.
- Consuming from topic in dc2 shows only the second message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-15 Thread Ismael Juma
Hi Grant,

Comments below.

On Wed, Jun 15, 2016 at 6:52 PM, Grant Henke  wrote:
>
> The one thing I want to avoid is to many super specific error codes. I am
> not sure how much of a problem it really is but in the case of wire
> protocol errors like multiple instances of the same topic, do you have any
> thoughts on the error? Should we make a generic InvalidRequest error and
> log the detailed message on the broker for client authors to debug?
>

That is a good question. It would be good to get input from client
developers like Dana on this.

When looking at the changing the patch, it looks like changing from CREATE
> to CREATE_TOPIC might pose some compatibility concerns. Is it alright if we
> leave it CREATE for now and revisit after KIP-4? It should not collide with
> the ACLs permission since we have control over that because its new.
>

Yes, I think it's fine to change it afterwards. However, I think we should
agree on a sensible plan during the Modify ACL request discussion to make
sure things make sense as a whole.

The produce request timeout is very similar to this timeout. There is no
> bounds validation on -1. Anything less than 0 is essentially 0. We could
> validate the timeout too and return an InvalidRequest (or whatever is
> discussed above) error in this case too if you prefer.
>

Fair enough. Probably good to remain consistent (even if I prefer a
stricter approach).

Ismael


[GitHub] kafka pull request #1501: MINOR: Expose window store sequence number

2016-06-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1501


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3753) Add approximateNumEntries() to the StateStore interface for metrics reporting

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3753:
-
Assignee: Jeff Klukas  (was: Guozhang Wang)

> Add approximateNumEntries() to the StateStore interface for metrics reporting
> -
>
> Key: KAFKA-3753
> URL: https://issues.apache.org/jira/browse/KAFKA-3753
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>  Labels: api
> Fix For: 0.10.1.0
>
>
> As a developer building a Kafka Streams application, I'd like to have 
> visibility into what's happening with my state stores. How can I know if a 
> particular store is growing large? How can I know if a particular store is 
> frequently needing to hit disk?
> I'm interested to know if there are existing mechanisms for extracting this 
> information or if other people have thoughts on how we might approach this.
> I can't think of a way to provide metrics generically, so each state store 
> implementation would likely need to handle this separately. Given that the 
> default RocksDBStore will likely be the most-used, it would be a first target 
> for adding metrics.
> I'd be interested in knowing the total number of entries in the store, the 
> total size on disk and in memory, rates of gets and puts, and hit/miss ratio 
> for the MemoryLRUCache. Some of these numbers are likely calculable through 
> the RocksDB API, others may simply not be accessible.
> Would there be value to the wider community in having state stores register 
> metrics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3701) Expose KafkaStreams metrics in public API

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3701:
-
Assignee: (was: Guozhang Wang)

> Expose KafkaStreams metrics in public API
> -
>
> Key: KAFKA-3701
> URL: https://issues.apache.org/jira/browse/KAFKA-3701
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jeff Klukas
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> The Kafka clients expose their metrics registries through a `metrics` method 
> presenting an unmodifiable collection, but `KafkaStreams` does not expose its 
> registry. Currently, applications can access a StreamsMetrics instance via 
> the ProcessorContext within a Processor, but this limits flexibility.
> Having read-only access to a KafkaStreams.metrics() method would allow a 
> developer to define a health check for their application based on the metrics 
> that KafkaStreams is collecting. Or a developer might want to define a metric 
> in some other framework based on KafkaStreams' metrics.
> I am imagining that an application would build and register 
> KafkaStreams-based health checks after building a KafkaStreams instance but 
> before calling the start() method. Are metrics added to the registry at the 
> time a KafkaStreams instance is constructed, or only after calling the 
> start() method? If metrics are registered only after application startup, 
> then this approach may not be sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3836) KStreamReduce and KTableReduce should not pass nulls to Deserializers

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3836:
-
Assignee: (was: Guozhang Wang)

> KStreamReduce and KTableReduce should not pass nulls to Deserializers
> -
>
> Key: KAFKA-3836
> URL: https://issues.apache.org/jira/browse/KAFKA-3836
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Avi Flax
>Priority: Trivial
>
> As per [this 
> discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3ccahwhrru29jw4jgvhsijwbvlzb3bc6qz6pbh9tqcfbcorjk4...@mail.gmail.com%3e]
>  these classes currently pass null values along to Deserializers, so those 
> Deserializers need to handle null inputs and pass them through without 
> throwing. It would be better for these classes to simply not call the 
> Deserializers in this case; this would reduce the burden of implementers of 
> {{Deserializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3535) Add metrics ability for streams serde components

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3535:
-
Assignee: (was: Guozhang Wang)

> Add metrics ability for streams serde components
> 
>
> Key: KAFKA-3535
> URL: https://issues.apache.org/jira/browse/KAFKA-3535
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.9.0.1
>Reporter: Michael Coon
>Priority: Minor
>  Labels: user-experience
>
> Add the ability to record metrics in the serializer/deserializer components. 
> As it stands, I cannot record latency/sensor metrics since the API does not 
> provide the context at the serde levels. Exposing the ProcessorContext at 
> this level may not be the solution; but perhaps change the configure method 
> to take a different config or init context and make the StreamMetrics 
> available in that context along with config information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3545) Generalized Serdes for List/Map

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3545:
-
Assignee: (was: Guozhang Wang)

> Generalized Serdes for List/Map
> ---
>
> Key: KAFKA-3545
> URL: https://issues.apache.org/jira/browse/KAFKA-3545
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Greg Fodor
>Priority: Minor
>  Labels: api, newbie
> Fix For: 0.10.1.0
>
>
> In working with Kafka Streams I've found it's often the case I want to 
> perform a "group by" operation, where I repartition a stream based on a 
> foreign key and then do an aggregation of all the values into a single 
> collection, so the stream becomes one where each entry has a value that is a 
> serialized list of values that belonged to the key. (This seems unrelated to 
> the 'group by' operation talked about in KAFKA-3544.) Basically the same 
> typical group by operation found in systems like Cascading.
> In order to create these intermediate list values I needed to define custom 
> avro schemas that simply wrap the elements of interest into a list. It seems 
> desirable that there be some basic facility for constructing simple Serdes of 
> Lists/Maps/Sets of other types, potentially using avro's serialization under 
> the hood. If this existed in the core library it would also enable the 
> addition of higher level operations on streams that can use these Serdes to 
> perform simple operations like the "group by" example I mention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3545) Generalized Serdes for List/Map

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3545:
-
Labels: api newbie  (was: api)

> Generalized Serdes for List/Map
> ---
>
> Key: KAFKA-3545
> URL: https://issues.apache.org/jira/browse/KAFKA-3545
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Greg Fodor
>Priority: Minor
>  Labels: api, newbie
> Fix For: 0.10.1.0
>
>
> In working with Kafka Streams I've found it's often the case I want to 
> perform a "group by" operation, where I repartition a stream based on a 
> foreign key and then do an aggregation of all the values into a single 
> collection, so the stream becomes one where each entry has a value that is a 
> serialized list of values that belonged to the key. (This seems unrelated to 
> the 'group by' operation talked about in KAFKA-3544.) Basically the same 
> typical group by operation found in systems like Cascading.
> In order to create these intermediate list values I needed to define custom 
> avro schemas that simply wrap the elements of interest into a list. It seems 
> desirable that there be some basic facility for constructing simple Serdes of 
> Lists/Maps/Sets of other types, potentially using avro's serialization under 
> the hood. If this existed in the core library it would also enable the 
> addition of higher level operations on streams that can use these Serdes to 
> perform simple operations like the "group by" example I mention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3595:
-
Assignee: (was: Guozhang Wang)

> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Priority: Minor
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Currently state store replication always go through a compact kafka topic. 
> For some state stores, e.g. JoinWindow, there are no duplicates in the store, 
> there is not much benefit using a compacted topic.
> The problem of using compacted topic is the records can stay in kafka broker 
> forever. In my use case, my key is ad_id, it's incrementing all the time, not 
> bounded, I am worried the disk space on broker for that topic will go forever.
> I think we either need the capability to purge the compacted records on 
> broker, or allow us to specify different compact option for state store 
> replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3752:
-
Assignee: (was: Guozhang Wang)

> Provide a way for KStreams to recover from unclean shutdown
> ---
>
> Key: KAFKA-3752
> URL: https://issues.apache.org/jira/browse/KAFKA-3752
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>  Labels: architecture
>
> If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM 
> Killer), it may leave behind lock files and fail to recover.
> It would be useful to have an options (say --force) to tell KStreams to 
> proceed even if it finds old LOCK files.
> {noformat}
> [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in 
> thread [StreamThread-1]:  
> (org.apache.kafka.streams.processor.internals.StreamThread:583)
> org.apache.kafka.streams.errors.ProcessorStateException: Error while creating 
> the state manager
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)
> Caused by: 

[jira] [Updated] (KAFKA-3576) Unify KStream and KTable API

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3576:
-
Assignee: Damian Guy  (was: Guozhang Wang)

> Unify KStream and KTable API
> 
>
> Key: KAFKA-3576
> URL: https://issues.apache.org/jira/browse/KAFKA-3576
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Damian Guy
>  Labels: api
> Fix For: 0.10.1.0
>
>
> For KTable aggregations, it has a pattern of 
> {{table.groupBy(...).aggregate(...)}}, and the data is repartitioned in an 
> inner topic based on the selected key in {{groupBy(...)}}.
> For KStream aggregations, though, it has a pattern of 
> {{stream.selectKey(...).through(...).aggregateByKey(...)}}. In other words, 
> users need to manually use a topic to repartition data, and the syntax is a 
> bit different with KTable as well.
> h2. Goal
> To have similar APIs for aggregations of KStream and KTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3741) KStream config for changelog min.in.sync.replicas

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3741:
-
Assignee: (was: Guozhang Wang)

> KStream config for changelog min.in.sync.replicas
> -
>
> Key: KAFKA-3741
> URL: https://issues.apache.org/jira/browse/KAFKA-3741
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>  Labels: api
>
> Kafka Streams currently allows you to specify a replication factor for 
> changelog and repartition topics that it creates.  It should also allow you 
> to specify min.in.sync.replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3770) KStream job should be able to specify linger.ms

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3770:
-
Assignee: Greg Fodor  (was: Guozhang Wang)

> KStream job should be able to specify linger.ms
> ---
>
> Key: KAFKA-3770
> URL: https://issues.apache.org/jira/browse/KAFKA-3770
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Greg Fodor
>Assignee: Greg Fodor
>
> The default linger.ms hardcoded into the StreamsConfig class of 100ms is 
> problematic for jobs that have lots of tasks, since this latency can accrue. 
> It seems useful to be able to override the linger.ms in the StreamsConfig. 
> Attached is a PR which allows this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3708) Rethink exception handling in KafkaStreams

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3708:
-
Assignee: (was: Guozhang Wang)

> Rethink exception handling in KafkaStreams
> --
>
> Key: KAFKA-3708
> URL: https://issues.apache.org/jira/browse/KAFKA-3708
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> As for 0.10.0.0, the worker threads (i.e. {{StreamThreads}}) can possibly 
> encounter the following runtime exceptions:
> 1) {{consumer.poll()}} could throw KafkaException if some of the 
> configuration are not accepted, such as topics not authorized to read / write 
> (security), session-timeout value not valid, etc; these exceptions will be 
> thrown in the first ever {{poll()}}.
> 2) {{task.addRecords()}} could throw KafkaException (most likely 
> SerializationException) if the deserialization fails.
> 3) {{task.process() / punctuate()}} could throw various KafkaException; for 
> example, serialization / deserialization errors, state storage operation 
> failures (RocksDBException, for example),  producer sending failures, etc.
> 4) {{maybeCommit / commitAll / commitOne}} could throw various Exceptions if 
> the flushing of state store fails, and when {{consumer.commitSync}} throws 
> exceptions other than {{CommitFailedException}}.
> For all the above 4 cases, KafkaStreams does not capture and handle them, but 
> expose them to users, and let users to handle them via 
> {{KafkaStreams.setUncaughtExceptionHandler}}. We need to re-think if the 
> library should just handle these cases without exposing them to users and 
> kill the threads / migrate tasks to others since they are all not recoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3729) Auto-configure non-default SerDes passed alongside the topology builder

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3729:
-
Labels: api newbie  (was: api)

>  Auto-configure non-default SerDes passed alongside the topology builder
> 
>
> Key: KAFKA-3729
> URL: https://issues.apache.org/jira/browse/KAFKA-3729
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Fred Patton
>  Labels: api, newbie
>
> From Guozhang Wang:
> "Only default serdes provided through configs are auto-configured today. But 
> we could auto-configure other serdes passed alongside the topology builder as 
> well."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3729) Auto-configure non-default SerDes passed alongside the topology builder

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3729:
-
Assignee: (was: Guozhang Wang)

>  Auto-configure non-default SerDes passed alongside the topology builder
> 
>
> Key: KAFKA-3729
> URL: https://issues.apache.org/jira/browse/KAFKA-3729
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Fred Patton
>  Labels: api, newbie
>
> From Guozhang Wang:
> "Only default serdes provided through configs are auto-configured today. But 
> we could auto-configure other serdes passed alongside the topology builder as 
> well."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3543) Allow a variant of transform() which can emit multiple values

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3543:
-
Assignee: (was: Guozhang Wang)

> Allow a variant of transform() which can emit multiple values
> -
>
> Key: KAFKA-3543
> URL: https://issues.apache.org/jira/browse/KAFKA-3543
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Greg Fodor
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Right now it seems that if you want to apply an arbitrary stateful 
> transformation to a stream, you either have to use a TransformerSupplier or 
> ProcessorSupplier sent to transform() or process(). The custom processor will 
> allow you to emit multiple new values, but the process() method currently 
> terminates that branch of the topology so you can't apply additional data 
> flow. transform() lets you continue the data flow, but forces you to emit a 
> single value for every input value.
> (It actually doesn't quite force you to do this, since you can hold onto the 
> ProcessorContext and emit multiple, but that's probably not the ideal way to 
> do it :))
> It seems desirable to somehow allow a transformation that emits multiple 
> values per input value. I'm not sure of the best way to factor this inside of 
> the current TransformerSupplier/Transformer architecture in a way that is 
> clean and efficient -- currently I'm doing the workaround above of just 
> calling forward() myself on the context and actually emitting dummy values 
> which are filtered out downstream.
> -
> It is worth considering adding a new flatTransofrm function as 
> {code}
>  KStream transform(TransformerSupplier Iterable>> transformerSupplier, String... stateStoreNames)
> {code}
> which is essentially the same as
> {code} transform().flatMap() {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3847) Connect tasks should not share a producer

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)
Ewen Cheslack-Postava created KAFKA-3847:


 Summary: Connect tasks should not share a producer
 Key: KAFKA-3847
 URL: https://issues.apache.org/jira/browse/KAFKA-3847
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 0.10.0.0
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Critical
 Fix For: 0.10.1.0


Currently the tasks share a producer. This is nice in terms of potentially 
coalescing requests to the same broker, keeping port usage reasonable, 
minimizing the # of connections to brokers (which is nice for brokers, not so 
important for connect itself). But it also means we unnecessarily tie tasks to 
each other in other ways -- e.g. when one needs to flush, it we effectively 
block it on other connector's data being produced and acked.

Given that we allocate a consumer per sink, a lot of the arguments for sharing 
a producer effectively go away. We should decouple the tasks by using a 
separate producer for each task (or, at a minimum, for each connector's tasks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3400) Topic stop working / can't describe topic

2016-06-15 Thread Ashish K Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish K Singh updated KAFKA-3400:
--
Resolution: Cannot Reproduce
Status: Resolved  (was: Patch Available)

Resolving this as it could not be reproducer. Please re-open with more details 
if this is still an issue.

> Topic stop working / can't describe topic
> -
>
> Key: KAFKA-3400
> URL: https://issues.apache.org/jira/browse/KAFKA-3400
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Tobias
>Assignee: Ashish K Singh
> Fix For: 0.10.1.0
>
>
> we are seeing an issue were we intermittently (every couple of hours) get and 
> error with certain topics. They stop working and producers give a 
> LeaderNotFoundException.
> When we then try to use kafka-topics.sh to describe the topic we get the 
> error below.
> Error while executing topic command : next on empty iterator
> {{
> [2016-03-15 17:30:26,231] ERROR java.util.NoSuchElementException: next on 
> empty iterator
>   at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
>   at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
>   at scala.collection.IterableLike$class.head(IterableLike.scala:91)
>   at scala.collection.AbstractIterable.head(Iterable.scala:54)
>   at 
> kafka.admin.TopicCommand$$anonfun$describeTopic$1.apply(TopicCommand.scala:198)
>   at 
> kafka.admin.TopicCommand$$anonfun$describeTopic$1.apply(TopicCommand.scala:188)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at kafka.admin.TopicCommand$.describeTopic(TopicCommand.scala:188)
>   at kafka.admin.TopicCommand$.main(TopicCommand.scala:66)
>   at kafka.admin.TopicCommand.main(TopicCommand.scala)
>  (kafka.admin.TopicCommand$)
> }}
> if we delete the topic, then it will start to work again for a while
> We can't see anything obvious in the logs but are happy to provide if needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3820) Provide utilities for tracking source offsets

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3820:
-
Labels: needs-kip  (was: )

> Provide utilities for tracking source offsets
> -
>
> Key: KAFKA-3820
> URL: https://issues.apache.org/jira/browse/KAFKA-3820
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Ewen Cheslack-Postava
>Assignee: Liquan Pei
>Priority: Minor
>  Labels: needs-kip
>
> OffsetStorageReader does not (and is not expected to) be immediately updated 
> when a SourceRecord is returned from poll(). However, this can be a bit 
> confusing to connector developers as they may return that data, then expect a 
> subsequent read from OffsetStorageReader should match that. In other words, 
> rather than tracking which offset they are at themselves in variables 
> maintained by the task implementation, the connector developer expected 
> OffsetStorageReader to do this for them.
> Part of the confusion comes from the fact that data is sent asynchronously 
> after returned from poll(), which explains the semantics we have. However, it 
> does also mean many connectors have similarly structured code where they keep 
> track of the current offset themselves. It might be nice to provide some 
> utilities, probably through the Context object, to get the last returned 
> offset for each source partition being processed by a task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2483) Add support for batch/scheduled Copycat connectors

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2483:
-
Labels: needs-kip  (was: )

> Add support for batch/scheduled Copycat connectors
> --
>
> Key: KAFKA-2483
> URL: https://issues.apache.org/jira/browse/KAFKA-2483
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
>  Labels: needs-kip
>
> A few connectors may not perform well if run continuously; for example, HDFS 
> may not handle a task holding a file open for very long periods of time well.
> These connectors will work better if they can schedule themselves to be 
> executed periodically. Note that this cannot currently be implemented by the 
> connector itself because in sink connectors get data delivered to them as it 
> streams in. However, it might be possible to make connectors handle this 
> themselves given the combination of KAFKA-2481 and KAFKA-2482 would make it 
> possible, if inconvenient, to implement this in the task itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3819) Provide utilities for polling source connectors

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3819:
-
Labels: needs-kip  (was: )

> Provide utilities for polling source connectors
> ---
>
> Key: KAFKA-3819
> URL: https://issues.apache.org/jira/browse/KAFKA-3819
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
>  Labels: needs-kip
>
> Source connectors that need to poll for data are currently responsible for 
> managing their own sleeping/backoff if they don't have any new data 
> available. This is becoming a very common pattern. It's also easy to 
> implement it incorrectly, e.g. by using Thread.sleep and not properly 
> interrupting on stop().
> We should probably provide some utilities, maybe just exposed via the Context 
> object to implement this for connector developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3815) Support command line arguments in Kafka Connect distributed worker

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3815:
-
Labels: needs-kip  (was: )

> Support command line arguments in Kafka Connect distributed worker
> --
>
> Key: KAFKA-3815
> URL: https://issues.apache.org/jira/browse/KAFKA-3815
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Randall Hauch
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
>  Labels: needs-kip
>
> Change the Kafka Connect distribute worker so that one connector could be 
> configured via the command line. This would make it much easier to define 
> immutable containers (e.g., Docker, Kubernetes), where each container runs a 
> single distributed worker with a single configured connector. A "force" flag 
> might specify whether any existing configuration could be overwritten by the 
> configuration passed via the command line.
> In fact, distributed environments that run immutable containers, especially 
> Kubernetes and OpenShift, would benefit greatly from being able to run each 
> Kafka Connect connector in one or more containers that are configured exactly 
> the same way and running as a single Kafka Connect group. Because the Kafka 
> Connect group has only a single configured connector, the group experiences 
> no unnecessary rebalances that would normally occur in other topologies with 
> multiple connectors deployed to one Kafka Connect group.
> Ideally, the distributed worker could also be run in read-only mode so that 
> the connector configuration cannot be changed via the REST API. This would 
> only help to reinforce the connector as being immutable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2378) Add Copycat embedded API

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2378:
-
Labels: needs-kip  (was: )

> Add Copycat embedded API
> 
>
> Key: KAFKA-2378
> URL: https://issues.apache.org/jira/browse/KAFKA-2378
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> Much of the required Copycat API will exist from previous patches since any 
> main() method will need to do very similar operations. However, integrating 
> with any other Java code may require additional API support.
> For example, one of the use cases when integrating with any stream processing 
> application will require knowing which topics will be written to. We will 
> need to add APIs to expose the topics a registered connector is writing to so 
> they can be consumed by a stream processing task



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3254) Add Connect storage topic prefix for easier configuration

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3254:
-
Labels: needs-kip  (was: )

> Add Connect storage topic prefix for easier configuration
> -
>
> Key: KAFKA-3254
> URL: https://issues.apache.org/jira/browse/KAFKA-3254
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
>  Labels: needs-kip
>
> Currently Connect depends on two topics: one for storing connector 
> configuration, and other for source connector offsets. In KAFKA-3093, we add 
> another topic to store connector/task statuses, and later it might make sense 
> to use additional topics (e.g. for metrics?). To make it easier for users to 
> configure these topics, it would make sense to have a configuration option 
> "storage.topic.prefix" (or similar) to basically serve as a namespace for the 
> Connect instance. Connect could then create the topics automatically with a 
> default suffix for each storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2376) Add Copycat metrics

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2376:
-
Labels: needs-kip  (was: )

> Add Copycat metrics
> ---
>
> Key: KAFKA-2376
> URL: https://issues.apache.org/jira/browse/KAFKA-2376
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> Copycat needs good metrics for monitoring since that will be the primary 
> insight into the health of connectors as they copy data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3813) Let SourceConnector implementations access the offset reader

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3813:
-
Labels: needs-kip  (was: )

> Let SourceConnector implementations access the offset reader
> 
>
> Key: KAFKA-3813
> URL: https://issues.apache.org/jira/browse/KAFKA-3813
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Randall Hauch
>Assignee: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> When a source connector is started, having access to the 
> {{OffsetStorageReader}} would allow it to more intelligently configure its 
> tasks based upon the stored offsets. (Currently only the {{SourceTask}} 
> implementations can access the offset reader (via the {{SourceTaskContext}}), 
> but the {{SourceConnector}} does not have access to the offset reader.)
> Of course, accessing the stored offsets is not likely to be useful for sink 
> connectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3821) Allow Kafka Connect source tasks to produce offset without writing to topics

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3821:
-
Labels: needs-kip  (was: )

> Allow Kafka Connect source tasks to produce offset without writing to topics
> 
>
> Key: KAFKA-3821
> URL: https://issues.apache.org/jira/browse/KAFKA-3821
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Randall Hauch
>Assignee: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> Provide a way for a {{SourceTask}} implementation to record a new offset for 
> a given partition without necessarily writing a source record to a topic.
> Consider a connector task that uses the same offset when producing an unknown 
> number of {{SourceRecord}} objects (e.g., it is taking a snapshot of a 
> database). Once the task completes those records, the connector wants to 
> update the offsets (e.g., the snapshot is complete) but has no more records 
> to be written to a topic. With this change, the task could simply supply an 
> updated offset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3073) KafkaConnect should support regular expression for topics

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3073:
-
Labels: needs-kip  (was: )

> KafkaConnect should support regular expression for topics
> -
>
> Key: KAFKA-3073
> URL: https://issues.apache.org/jira/browse/KAFKA-3073
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Liquan Pei
>  Labels: needs-kip
> Fix For: 0.10.1.0
>
>
> KafkaConsumer supports both a list of topics or a pattern when subscribing. 
> KafkaConnect only supports a list of topics, which is not just more of a 
> hassle to configure - it also requires more maintenance.
> I suggest adding topics.pattern as a new configuration and letting users 
> choose between 'topics' or 'topics.pattern'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3487) Support per-connector/per-task classloaders in Connect

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3487:
-
Labels: needs-kip  (was: )

> Support per-connector/per-task classloaders in Connect
> --
>
> Key: KAFKA-3487
> URL: https://issues.apache.org/jira/browse/KAFKA-3487
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Liquan Pei
>Priority: Critical
>  Labels: needs-kip
>
> Currently we just use the default ClassLoader in Connect. However, this 
> limits how we can compatibly load conflicting connector plugins. Ideally we 
> would use a separate class loader per connector/task that is instantiated to 
> avoid potential conflicts.
> Note that this also opens up options for other ways to provide jars to 
> instantiate connectors. For example, Spark uses this to dynamically publish 
> classes defined in the REPL and load them via URL: 
> https://ardoris.wordpress.com/2014/03/30/how-spark-does-class-loading/ But 
> much simpler examples (include URL in the connector class instead of just 
> class name) are also possible and could be a nice way to more support dynamic 
> sets of connectors, multiple versions of the same connector, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3209) Support single message transforms in Kafka Connect

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3209:
-
Labels: needs-kip  (was: )

> Support single message transforms in Kafka Connect
> --
>
> Key: KAFKA-3209
> URL: https://issues.apache.org/jira/browse/KAFKA-3209
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Neha Narkhede
>Priority: Critical
>  Labels: needs-kip
> Fix For: 0.10.1.0
>
>
> Users should be able to perform light transformations on messages between a 
> connector and Kafka. This is needed because some transformations must be 
> performed before the data hits Kafka (e.g. filtering certain types of events 
> or PII filtering). It's also useful for very light, single-message 
> modifications that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3846) Connect record types should include timestamps

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)
Ewen Cheslack-Postava created KAFKA-3846:


 Summary: Connect record types should include timestamps
 Key: KAFKA-3846
 URL: https://issues.apache.org/jira/browse/KAFKA-3846
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 0.10.0.0
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
 Fix For: 0.10.1.0


Timestamps were added to records in the previous release, however this does not 
get propagated automatically to Connect because it uses custom wrappers  to add 
fields and rename some for clarity.

The addition of timestamps should be trivial, but can be really useful (e.g. in 
sink connectors that would like to include timestamp info if available but when 
it is not stored in the value).

This is public API so it will need a KIP despite being very uncontentious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3845) Support per-connector converters

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)
Ewen Cheslack-Postava created KAFKA-3845:


 Summary: Support per-connector converters
 Key: KAFKA-3845
 URL: https://issues.apache.org/jira/browse/KAFKA-3845
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.10.0.0
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Critical
 Fix For: 0.10.1.0


While good for default configuration and reducing the total configuration the 
user needs to do, it's inconvenient requiring that all connectors on a cluster 
need to use the same converter. It's definitely a good idea to stay consistent, 
but occasionally you may need a special converters, e.g. one source of data 
happens to come in JSON despite you standardizing on Avro.

Note that these configs are connector-level in the sense that the entire 
connector should use a single converter type, but since converters are used by 
tasks the config needs to be automatically propagated to tasks.

This is effectively public API change as it is adding a built-in config for 
connectors/tasks, so this probably requires a KIP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3845) Support per-connector converters

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3845:
-
Issue Type: Improvement  (was: Bug)

> Support per-connector converters
> 
>
> Key: KAFKA-3845
> URL: https://issues.apache.org/jira/browse/KAFKA-3845
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> While good for default configuration and reducing the total configuration the 
> user needs to do, it's inconvenient requiring that all connectors on a 
> cluster need to use the same converter. It's definitely a good idea to stay 
> consistent, but occasionally you may need a special converters, e.g. one 
> source of data happens to come in JSON despite you standardizing on Avro.
> Note that these configs are connector-level in the sense that the entire 
> connector should use a single converter type, but since converters are used 
> by tasks the config needs to be automatically propagated to tasks.
> This is effectively public API change as it is adding a built-in config for 
> connectors/tasks, so this probably requires a KIP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3351) Kafka Connect Rest restarts all connectors on POST or DELETE

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3351:
-
Fix Version/s: (was: 0.10.1.0)

> Kafka Connect Rest restarts all connectors on POST or DELETE
> 
>
> Key: KAFKA-3351
> URL: https://issues.apache.org/jira/browse/KAFKA-3351
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Andrew Stevenson
>Assignee: Jason Gustafson
>
> When running Kafka Connect in distributed mode both posting a new connector 
> and deleting an existing connector restarts all the connectors.
> For example, in my testing if I start a Cassandra Sink and an Elastic Sink 
> via Rest then stop the Elastic sink via Rest then the Elastic Sink stops (as 
> expected) but the Cassandra Sink restarts.
> I wouldn't have expected other sinks or sources than the one my request is 
> about to be affected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2483) Add support for batch/scheduled Copycat connectors

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2483:
-
Fix Version/s: (was: 0.10.1.0)

> Add support for batch/scheduled Copycat connectors
> --
>
> Key: KAFKA-2483
> URL: https://issues.apache.org/jira/browse/KAFKA-2483
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
>
> A few connectors may not perform well if run continuously; for example, HDFS 
> may not handle a task holding a file open for very long periods of time well.
> These connectors will work better if they can schedule themselves to be 
> executed periodically. Note that this cannot currently be implemented by the 
> connector itself because in sink connectors get data delivered to them as it 
> streams in. However, it might be possible to make connectors handle this 
> themselves given the combination of KAFKA-2481 and KAFKA-2482 would make it 
> possible, if inconvenient, to implement this in the task itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3209) Support single message transforms in Kafka Connect

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3209:
-
Priority: Critical  (was: Major)

> Support single message transforms in Kafka Connect
> --
>
> Key: KAFKA-3209
> URL: https://issues.apache.org/jira/browse/KAFKA-3209
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Neha Narkhede
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> Users should be able to perform light transformations on messages between a 
> connector and Kafka. This is needed because some transformations must be 
> performed before the data hits Kafka (e.g. filtering certain types of events 
> or PII filtering). It's also useful for very light, single-message 
> modifications that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3832) Kafka Connect's JSON Converter never outputs a null value

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3832:
-
Fix Version/s: 0.10.1.0

> Kafka Connect's JSON Converter never outputs a null value
> -
>
> Key: KAFKA-3832
> URL: https://issues.apache.org/jira/browse/KAFKA-3832
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Randall Hauch
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.10.1.0
>
>
> Kafka Connect's JSON Converter will never output a null value when 
> {{enableSchemas=true}}. This means that when a connector outputs a 
> {{SourceRecord}} with a null value, the JSON Converter will always produce a 
> message value with:
> {code:json}
> {
>   "schema": null,
>   "payload": null
> }
> {code}
> And, this means that while Kafka log compaction will always be able to remove 
> earlier messages with the same key, log compaction will never remove _all_ of 
> the messages with the same key. 
> The JSON Connector's {{fromConnectData(...)}} should always return null when 
> it is supplied a null value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2925) NullPointerException if FileStreamSinkTask is stopped before initialization finishes

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332159#comment-15332159
 ] 

Ewen Cheslack-Postava edited comment on KAFKA-2925 at 6/15/16 5:38 PM:
---

[~hachikuji] Can you remember if we've gotten the start()/stop() behavior into 
better shape now such that this isn't possible? I remember a few patches around 
that ordering -- specifically, I think we may be managing this at the framework 
level such that the stop() won't be invoked until start() is both invoked and 
completed.


was (Author: ewencp):
[~hachikuji] Can you remember if we've gotten the start()/stop() behavior into 
better shape now such that this isn't possible? I remember a few patches around 
that ordering.

> NullPointerException if FileStreamSinkTask is stopped before initialization 
> finishes
> 
>
> Key: KAFKA-2925
> URL: https://issues.apache.org/jira/browse/KAFKA-2925
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
>
> If a FileStreamSinkTask is stopped too quickly after a distributed herder 
> rebalances work, it can result in cleanup happening without start() ever 
> being called:
> {quote}
> Sink task org.apache.kafka.connect.runtime.WorkerSinkTask@f9ac651 was stopped 
> before completing join group. Task initialization and start is being skipped 
> (org.apache.kafka.connect.runtime.WorkerSinkTask:150)
> {quote}
> This is actually a bit weird since stop() is still called so resources 
> allocated in the constructor can be cleaned up, but possibly unexpected that 
> stop() will be called without start() ever being called.
> Because the code in FileStreamSinkTask's stop() method assumes start() has 
> been called, it can result in a NullPointerException because it assumes the 
> PrintStream is already initialized.
> The easy fix is to check for nulls before closing. However, we should 
> probably also consider whether the current possibly sequence of events is 
> confusing and if we shoud not invoke stop() and make it clear in the SInkTask 
> interface that you should only initialize stuff in the constructor that won't 
> need any manual cleanup later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2925) NullPointerException if FileStreamSinkTask is stopped before initialization finishes

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332159#comment-15332159
 ] 

Ewen Cheslack-Postava commented on KAFKA-2925:
--

[~hachikuji] Can you remember if we've gotten the start()/stop() behavior into 
better shape now such that this isn't possible? I remember a few patches around 
that ordering.

> NullPointerException if FileStreamSinkTask is stopped before initialization 
> finishes
> 
>
> Key: KAFKA-2925
> URL: https://issues.apache.org/jira/browse/KAFKA-2925
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
>
> If a FileStreamSinkTask is stopped too quickly after a distributed herder 
> rebalances work, it can result in cleanup happening without start() ever 
> being called:
> {quote}
> Sink task org.apache.kafka.connect.runtime.WorkerSinkTask@f9ac651 was stopped 
> before completing join group. Task initialization and start is being skipped 
> (org.apache.kafka.connect.runtime.WorkerSinkTask:150)
> {quote}
> This is actually a bit weird since stop() is still called so resources 
> allocated in the constructor can be cleaned up, but possibly unexpected that 
> stop() will be called without start() ever being called.
> Because the code in FileStreamSinkTask's stop() method assumes start() has 
> been called, it can result in a NullPointerException because it assumes the 
> PrintStream is already initialized.
> The easy fix is to check for nulls before closing. However, we should 
> probably also consider whether the current possibly sequence of events is 
> confusing and if we shoud not invoke stop() and make it clear in the SInkTask 
> interface that you should only initialize stuff in the constructor that won't 
> need any manual cleanup later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2941) Docs for key/value converter in Kafka connect are unclear

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2941:
-
Fix Version/s: 0.10.1.0

> Docs for key/value converter in Kafka connect are unclear
> -
>
> Key: KAFKA-2941
> URL: https://issues.apache.org/jira/browse/KAFKA-2941
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> These docs don't really explain what the configs do or why users might want 
> to change them.
> Via [~gwenshap], something like this would be better: "Converter class for 
> key Connect data. This controls the format of the data that will be written 
> either to Kafka or to a sink system. Popular formats include Json and Avro"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2941) Docs for key/value converter in Kafka connect are unclear

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2941:
-
Priority: Blocker  (was: Minor)

Bumping up the priority here as there's a seemingly unending stream of 
questions and confusion around this subject.

> Docs for key/value converter in Kafka connect are unclear
> -
>
> Key: KAFKA-2941
> URL: https://issues.apache.org/jira/browse/KAFKA-2941
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> These docs don't really explain what the configs do or why users might want 
> to change them.
> Via [~gwenshap], something like this would be better: "Converter class for 
> key Connect data. This controls the format of the data that will be written 
> either to Kafka or to a sink system. Popular formats include Json and Avro"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3054) Connect Herder fail forever if sent a wrong connector config or task config

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3054:
-
 Priority: Blocker  (was: Major)
Fix Version/s: 0.10.1.0

We improved error handling in 0.10.0.0. In theory we should be catching and 
handling these errors, marking the connector/task as dead. However, we should 
make sure we have a test covering this specific case to validate the handling 
before marking this resolved. And we should make sure we cover *both* types of 
invalid configs for *both* types of connectors -- since we do some initial 
parsing of configs to setup the connector within the framework and then the 
connector does its own parsing, we should make sure all failure scenarios are 
handled.

> Connect Herder fail forever if sent a wrong connector config or task config
> ---
>
> Key: KAFKA-3054
> URL: https://issues.apache.org/jira/browse/KAFKA-3054
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: jin xing
>Assignee: jin xing
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> Connector Herder throws ConnectException and shutdown if sent a wrong config, 
> restarting herder will keep failing with the wrong config; It make sense that 
> herder should stay available when start connector or task failed; After 
> receiving a delete connector request, the herder can delete the wrong config 
> from "config storage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-55: Secure quotas for authenticated users

2016-06-15 Thread Rajini Sivaram
Harsha,

The sample configuration under
https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users#KIP-55:SecureQuotasforAuthenticatedUsers-QuotaConfiguration
shows
the Zookeeper data for different scenarios.

   - *user1* (/users/user1 in ZK) has only user-level quotas
   - *user2* (/users/user2 in ZK) defines user-level quotas and sub-quotas
   for clients *clientA* and *clientB*. Other client-ids of *user2* share
   the remaining quota of *user2*.


On Wed, Jun 15, 2016 at 5:30 PM, Harsha  wrote:

> Rajini,
>   How does sub-quotas works in case of authenticated users.
>   Where are we maintaining the relation between users and their
>   client Ids. Can you add an example of zk data under /users.
> Thanks,
> Harsha
>
> On Mon, Jun 13, 2016, at 05:01 AM, Rajini Sivaram wrote:
> > I have updated KIP-55 to reflect the changes from the discussions in the
> > voting thread (
> > https://www.mail-archive.com/dev@kafka.apache.org/msg51610.html).
> >
> > Jun/Gwen,
> >
> > Existing client-id quotas will be used as default client-id quotas for
> > users when no user quotas are configured - i.e., default user quota is
> > unlimited and no user-specific quota override is specified. This enables
> > user rate limits to be configured for ANONYMOUS if required in a cluster
> > that has both PLAINTEXT and SSL/SASL. By default, without any user rate
> > limits set, rate limits for client-ids will apply, retaining the current
> > client-id quota configuration for single-user clusters.
> >
> > Zookeeper will have two paths /clients with client-id quotas that apply
> > only when user quota is unlimited similar to now. And /users which
> > persists
> > user quotas for any user including ANONYMOUS.
> >
> > Comments and feedback are appreciated.
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Wed, Jun 8, 2016 at 9:00 PM, Rajini Sivaram
> >  > > wrote:
> >
> > > Jun,
> > >
> > > Oops, sorry, I hadn't realized that the last note was on the discuss
> > > thread. Thank you for pointing it out. I have sent another note for
> voting.
> > >
> > >
> > > On Wed, Jun 8, 2016 at 4:30 PM, Jun Rao  wrote:
> > >
> > >> Rajini,
> > >>
> > >> Perhaps it will be clearer if you start the voting in a new thread
> (with
> > >> VOTE in the subject).
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Tue, Jun 7, 2016 at 1:55 PM, Rajini Sivaram <
> > >> rajinisiva...@googlemail.com
> > >> > wrote:
> > >>
> > >> > I would like to initiate the vote for KIP-55.
> > >> >
> > >> > The KIP details are here: KIP-55: Secure quotas for authenticated
> users
> > >> > <
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users
> > >> > >
> > >> > .
> > >> >
> > >> > The JIRA  KAFKA-3492  <
> https://issues.apache.org/jira/browse/KAFKA-3492
> > >> > >has
> > >> > a draft PR here: https://github.com/apache/kafka/pull/1256.
> > >> >
> > >> > Thank you...
> > >> >
> > >> >
> > >> > Regards,
> > >> >
> > >> > Rajini
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Rajini
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Rajini
>



-- 
Regards,

Rajini


[jira] [Updated] (KAFKA-3008) Connect should parallelize task start/stop

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3008:
-
Priority: Minor  (was: Major)

> Connect should parallelize task start/stop
> --
>
> Key: KAFKA-3008
> URL: https://issues.apache.org/jira/browse/KAFKA-3008
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Liquan Pei
>Priority: Minor
>
> The Herder implementations currently iterate over all connectors/tasks and 
> sequentially start/stop them. We should parallelize this. This is less 
> critical for {{StandaloneHerder}}, but pretty important for 
> {{DistributedHerder}} since it will generally be managing more tasks and any 
> delay starting/stopping a single task will impact every other task on the 
> node (and can ultimately result in incorrect behavior in the case of a single 
> offset commit in one connector taking too long preventing all of the rest 
> from committing offsets).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3209) Support single message transforms in Kafka Connect

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3209:
-
Fix Version/s: 0.10.1.0

> Support single message transforms in Kafka Connect
> --
>
> Key: KAFKA-3209
> URL: https://issues.apache.org/jira/browse/KAFKA-3209
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Neha Narkhede
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> Users should be able to perform light transformations on messages between a 
> connector and Kafka. This is needed because some transformations must be 
> performed before the data hits Kafka (e.g. filtering certain types of events 
> or PII filtering). It's also useful for very light, single-message 
> modifications that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2378) Add Copycat embedded API

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2378:
-
Fix Version/s: (was: 0.10.1.0)

> Add Copycat embedded API
> 
>
> Key: KAFKA-2378
> URL: https://issues.apache.org/jira/browse/KAFKA-2378
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> Much of the required Copycat API will exist from previous patches since any 
> main() method will need to do very similar operations. However, integrating 
> with any other Java code may require additional API support.
> For example, one of the use cases when integrating with any stream processing 
> application will require knowing which topics will be written to. We will 
> need to add APIs to expose the topics a registered connector is writing to so 
> they can be consumed by a stream processing task



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3073) KafkaConnect should support regular expression for topics

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3073:
-
Issue Type: Improvement  (was: Bug)

> KafkaConnect should support regular expression for topics
> -
>
> Key: KAFKA-3073
> URL: https://issues.apache.org/jira/browse/KAFKA-3073
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Liquan Pei
> Fix For: 0.10.1.0
>
>
> KafkaConsumer supports both a list of topics or a pattern when subscribing. 
> KafkaConnect only supports a list of topics, which is not just more of a 
> hassle to configure - it also requires more maintenance.
> I suggest adding topics.pattern as a new configuration and letting users 
> choose between 'topics' or 'topics.pattern'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3073) KafkaConnect should support regular expression for topics

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3073:
-
Fix Version/s: 0.10.1.0

> KafkaConnect should support regular expression for topics
> -
>
> Key: KAFKA-3073
> URL: https://issues.apache.org/jira/browse/KAFKA-3073
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Liquan Pei
> Fix For: 0.10.1.0
>
>
> KafkaConsumer supports both a list of topics or a pattern when subscribing. 
> KafkaConnect only supports a list of topics, which is not just more of a 
> hassle to configure - it also requires more maintenance.
> I suggest adding topics.pattern as a new configuration and letting users 
> choose between 'topics' or 'topics.pattern'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3816) Provide more context in Kafka Connect log messages

2016-06-15 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3816:
-
Priority: Critical  (was: Major)

> Provide more context in Kafka Connect log messages
> --
>
> Key: KAFKA-3816
> URL: https://issues.apache.org/jira/browse/KAFKA-3816
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Randall Hauch
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>
> Currently it is relatively difficult to correlate individual log messages 
> with the various threads and activities that are going on within a Kafka 
> Connect worker, let along a cluster of workers. Log messages should provide 
> more context to make it easier and to allow log scraping tools to coalesce 
> related log messages.
> One simple way to do this is by using _mapped diagnostic contexts_, or MDC. 
> This is supported by the SLF4J API, and by the Logback and Log4J logging 
> frameworks.
> Basically, the framework would be changed so that each thread is configured 
> with one or more MDC parameters using the 
> {{org.slf4j.MDC.put(String,String)}} method in SLF4J. Once that thread is 
> configured, all log messages made using that thread have that context. The 
> logs can then be configured to use those parameters.
> It would be ideal to define a convention for connectors and the Kafka Connect 
> framework. A single set of MDC parameters means that the logging framework 
> can use the specific parameters on its message formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >