[jira] [Updated] (KAFKA-3965) Mirror maker sync send fails will lose messages

2016-09-26 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-3965:

Status: Patch Available  (was: Open)

> Mirror maker sync send fails will lose messages
> ---
>
> Key: KAFKA-3965
> URL: https://issues.apache.org/jira/browse/KAFKA-3965
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: NieWang
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> 1、Source kafka cluster storage some messages and message size is 500 bytes.
> 2、Mirror maker producer config 
>  producer.type=sync
>  max.request.size=400
> 3、Start mirror maker backup message from source kafka cluster to destination 
> kafka cluster, then mirror maker will quit because message larger than 400.
> 4、Check source kafka cluster will find offset have set to 1.
> 5、Check destination kafka cluster, then find have not any message.
> 6、Delete producer config max.request.size=400,start mirror maker again. Wait 
> mirror maker finish backup, then find have lose the first message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3994) Deadlock between consumer heartbeat expiration and offset commit.

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3994:
---
Fix Version/s: 0.10.1.0

> Deadlock between consumer heartbeat expiration and offset commit.
> -
>
> Key: KAFKA-3994
> URL: https://issues.apache.org/jira/browse/KAFKA-3994
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0, 0.10.0.2
>
>
> I got the following stacktraces from ConsumerBounceTest
> {code}
> ...
> "Test worker" #12 prio=5 os_prio=0 tid=0x7fbb28b7f000 nid=0x427c runnable 
> [0x7fbb06445000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0003d48bcbc0> (a sun.nio.ch.Util$2)
> - locked <0x0003d48bcbb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x0003d48bbd28> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:454)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:277)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:411)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1086)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1054)
> at 
> kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:103)
> at 
> kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:70)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:105)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:64)
> at 
> 

[GitHub] kafka pull request #1915: KAFKA-3965 mirror maker should not commit offset w...

2016-09-26 Thread becketqin
GitHub user becketqin opened a pull request:

https://github.com/apache/kafka/pull/1915

KAFKA-3965 mirror maker should not commit offset when exception is thrown 
from producer.send()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/becketqin/kafka KAFKA-3965

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1915.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1915


commit 0893c16af702cb896d94c3ec63034508f2679ab4
Author: Jiangjie Qin 
Date:   2016-09-27T05:05:18Z

KAFKA-3965 mirror maker should not commit offset when exception is thrown 
from producer.send()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3965) Mirror maker sync send fails will lose messages

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15525087#comment-15525087
 ] 

ASF GitHub Bot commented on KAFKA-3965:
---

GitHub user becketqin opened a pull request:

https://github.com/apache/kafka/pull/1915

KAFKA-3965 mirror maker should not commit offset when exception is thrown 
from producer.send()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/becketqin/kafka KAFKA-3965

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1915.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1915


commit 0893c16af702cb896d94c3ec63034508f2679ab4
Author: Jiangjie Qin 
Date:   2016-09-27T05:05:18Z

KAFKA-3965 mirror maker should not commit offset when exception is thrown 
from producer.send()




> Mirror maker sync send fails will lose messages
> ---
>
> Key: KAFKA-3965
> URL: https://issues.apache.org/jira/browse/KAFKA-3965
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: NieWang
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> 1、Source kafka cluster storage some messages and message size is 500 bytes.
> 2、Mirror maker producer config 
>  producer.type=sync
>  max.request.size=400
> 3、Start mirror maker backup message from source kafka cluster to destination 
> kafka cluster, then mirror maker will quit because message larger than 400.
> 4、Check source kafka cluster will find offset have set to 1.
> 5、Check destination kafka cluster, then find have not any message.
> 6、Delete producer config max.request.size=400,start mirror maker again. Wait 
> mirror maker finish backup, then find have lose the first message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3965) Mirror maker sync send fails will lose messages

2016-09-26 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin reassigned KAFKA-3965:
---

Assignee: Jiangjie Qin

> Mirror maker sync send fails will lose messages
> ---
>
> Key: KAFKA-3965
> URL: https://issues.apache.org/jira/browse/KAFKA-3965
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: NieWang
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> 1、Source kafka cluster storage some messages and message size is 500 bytes.
> 2、Mirror maker producer config 
>  producer.type=sync
>  max.request.size=400
> 3、Start mirror maker backup message from source kafka cluster to destination 
> kafka cluster, then mirror maker will quit because message larger than 400.
> 4、Check source kafka cluster will find offset have set to 1.
> 5、Check destination kafka cluster, then find have not any message.
> 6、Delete producer config max.request.size=400,start mirror maker again. Wait 
> mirror maker finish backup, then find have lose the first message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3266) List/Alter Acls - protocol and server side implementation

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3266:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> List/Alter Acls - protocol and server side implementation
> -
>
> Key: KAFKA-3266
> URL: https://issues.apache.org/jira/browse/KAFKA-3266
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Grant Henke
>Assignee: Grant Henke
> Fix For: 0.10.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3405) Deduplicate and break out release tasks

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3405:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Deduplicate and break out release tasks
> ---
>
> Key: KAFKA-3405
> URL: https://issues.apache.org/jira/browse/KAFKA-3405
> Project: Kafka
>  Issue Type: Sub-task
>  Components: build
>Reporter: Grant Henke
>Assignee: Grant Henke
> Fix For: 0.10.2.0
>
>
> Tasks like copyDependent libs are repeated throughout the build. Other tasks 
> like releaseTarGz should be be moved out of the core module. 
> While refactoring this code other optimizations like ensuring sources and 
> javadoc jars are not included in the classpath should be done as well.
> If it makes sense, moving the release tasks to a separate gradle file is 
> preferred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3965) Mirror maker sync send fails will lose messages

2016-09-26 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15525079#comment-15525079
 ] 

Jiangjie Qin commented on KAFKA-3965:
-

Yeah, there is a bug that may cause this problem. We need to set 
{{exitOnSendFailure}} to true when producer.send() throws exception. I'll 
submit a patch for it. Thanks for reporting this issue.

> Mirror maker sync send fails will lose messages
> ---
>
> Key: KAFKA-3965
> URL: https://issues.apache.org/jira/browse/KAFKA-3965
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: NieWang
> Fix For: 0.10.1.0
>
>
> 1、Source kafka cluster storage some messages and message size is 500 bytes.
> 2、Mirror maker producer config 
>  producer.type=sync
>  max.request.size=400
> 3、Start mirror maker backup message from source kafka cluster to destination 
> kafka cluster, then mirror maker will quit because message larger than 400.
> 4、Check source kafka cluster will find offset have set to 1.
> 5、Check destination kafka cluster, then find have not any message.
> 6、Delete producer config max.request.size=400,start mirror maker again. Wait 
> mirror maker finish backup, then find have lose the first message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1548) Refactor the "replica_id" in requests

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1548:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Refactor the "replica_id" in requests
> -
>
> Key: KAFKA-1548
> URL: https://issues.apache.org/jira/browse/KAFKA-1548
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Gwen Shapira
>  Labels: newbie
> Fix For: 0.10.2.0
>
>
> Today in many requests like fetch and offset we have a integer replica_id 
> field, if the request is from a follower consumer it is the broker id from 
> that follower replica, if it is from a regular consumer it could be one of 
> the two values: "-1" for ordinary consumer, or "-2" for debugging consumer. 
> Hence this replica_id field is used in two folds:
> 1) Logging for trouble shooting in request logs, which can be helpful only 
> when this is from a follower replica, 
> 2) Deciding if it is from the consumer or a replica to logically handle the 
> request in different ways. For this purpose we do not really care about the 
> actually id value.
> We probably would like to do the following improvements:
> 1) Rename "replica_id" to sth. less confusing?
> 2) Change the request.toString() function based on the replica_id, whether it 
> is a positive integer (meaning from a broker replica fetcher) or -1/-2 
> (meaning from a regular consumer).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1043) Time-consuming FetchRequest could block other request in the response queue

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1043:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Time-consuming FetchRequest could block other request in the response queue
> ---
>
> Key: KAFKA-1043
> URL: https://issues.apache.org/jira/browse/KAFKA-1043
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>  Labels: newbie++
> Fix For: 0.10.2.0
>
>
> Since in SocketServer the processor who takes any request is also responsible 
> for writing the response for that request, we make each processor owning its 
> own response queue. If a FetchRequest takes irregularly long time to write 
> the channel buffer it would block all other responses in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3913) Old consumer's metrics error when using IPv6

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-3913.

Resolution: Duplicate

Marking this as a duplicate of KAFKA-3930, which already has a patch available.

> Old consumer's metrics error when using IPv6 
> -
>
> Key: KAFKA-3913
> URL: https://issues.apache.org/jira/browse/KAFKA-3913
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Pengwei
> Fix For: 0.10.1.0
>
>
> The error is below:
> [2016-05-09 15:49:20,096] WARN Property security.protocol is not valid 
> (kafka.utils.VerifiableProperties)
> [2016-05-09 15:49:20,882] WARN Error processing 
> kafka.server:type=FetcherStats,name=RequestsPerSec,clientId=console-consumer-32775,brokerHost=fe80::92e2:baff:fe07:51cc,brokerPort=9093
>  (com.yammer.metrics.reporting.JmxReporter)
> javax.management.MalformedObjectNameException: Invalid character ':' in value 
> part of property
> at javax.management.ObjectName.construct(ObjectName.java:618)
> at javax.management.ObjectName.(ObjectName.java:1382)
> at 
> com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)
> at 
> com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)
> at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)
> at com.yammer.metrics.core.MetricsRegistry.newMeter(MetricsRegistry.java:240)
> at kafka.metrics.KafkaMetricsGroup$class.newMeter(KafkaMetricsGroup.scala:79)
> at kafka.server.FetcherStats.newMeter(AbstractFetcherThread.scala:264)
> at kafka.server.FetcherStats.(AbstractFetcherThread.scala:269)
> at kafka.server.AbstractFetcherThread.(AbstractFetcherThread.scala:55)
> at kafka.consumer.ConsumerFetcherThread.(ConsumerFetcherThread.scala:38)
> at 
> kafka.consumer.ConsumerFetcherManager.createFetcherThread(ConsumerFetcherManager.scala:118)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
> at 
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
> at 
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:95)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4222) Transient failure in QueryableStateIntegrationTest.queryOnRebalance

2016-09-26 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15525049#comment-15525049
 ] 

Guozhang Wang commented on KAFKA-4222:
--

[~enothereska] Could you take a look at this?

> Transient failure in QueryableStateIntegrationTest.queryOnRebalance
> ---
>
> Key: KAFKA-4222
> URL: https://issues.apache.org/jira/browse/KAFKA-4222
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Jason Gustafson
>Assignee: Eno Thereska
> Fix For: 0.10.1.0
>
>
> Seen here: https://builds.apache.org/job/kafka-trunk-jdk8/915/console
> {code}
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
> queryOnRebalance[1] FAILED
> java.lang.AssertionError: Condition not met within timeout 3. waiting 
> for metadata, store and value to be non null
> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:276)
> at 
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest.verifyAllKVKeys(QueryableStateIntegrationTest.java:263)
> at 
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:342)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk7 #1572

2016-09-26 Thread Apache Jenkins Server
See 



[jira] [Updated] (KAFKA-4222) Transient failure in QueryableStateIntegrationTest.queryOnRebalance

2016-09-26 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4222:
-
Assignee: Eno Thereska  (was: Guozhang Wang)

> Transient failure in QueryableStateIntegrationTest.queryOnRebalance
> ---
>
> Key: KAFKA-4222
> URL: https://issues.apache.org/jira/browse/KAFKA-4222
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Jason Gustafson
>Assignee: Eno Thereska
> Fix For: 0.10.1.0
>
>
> Seen here: https://builds.apache.org/job/kafka-trunk-jdk8/915/console
> {code}
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
> queryOnRebalance[1] FAILED
> java.lang.AssertionError: Condition not met within timeout 3. waiting 
> for metadata, store and value to be non null
> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:276)
> at 
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest.verifyAllKVKeys(QueryableStateIntegrationTest.java:263)
> at 
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:342)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3947) kafka-reassign-partitions.sh should support dumping current assignment

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3947:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> kafka-reassign-partitions.sh should support dumping current assignment
> --
>
> Key: KAFKA-3947
> URL: https://issues.apache.org/jira/browse/KAFKA-3947
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.10.0.0
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> When I building my own tool to perform reassignment of partitions, I realized 
> that there's no way to dump the current partition assignment in machine 
> parsable format such as JSON.
> Actually giving {{\-\-generate}} option to the kafka-reassign-partitions.sh 
> script dumps the current assignment of topic given by 
> {{\-\-topics-to-assign-json-file}} but it's very inconvenient because of:
> - I want the dump containing all topics. That is, I wanna skip generating the 
> list of current topics to pass it to the generate command.
> - The output is concatenated with the result of reassignment so can't do 
> simply something like: {{kafka-reassign-partitions.sh --generate ... > 
> current-assignment.json}}
> - Don't need to ask kafka to generate reassginment to get the current 
> assignment in the first place.
> Here I'd like to add the {{\-\-dump}} option to kafka-reassign-partitions.sh.
> I was wondering whether this functionality should be provided by 
> {{kafka-reassign-partitions.sh}} or {{kafka-topics.sh}} but now I think 
> {{kafka-reassign-partitions.sh}} should be much proper as the resulting JSON 
> should be in the format of {{\-\-reassignment-json-file}} which sticks to 
> this command.
> Will follow up the patch implements this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3989) Add JMH module for Benchmarks

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3989:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Add JMH module for Benchmarks
> -
>
> Key: KAFKA-3989
> URL: https://issues.apache.org/jira/browse/KAFKA-3989
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Bill Bejeck
>Assignee: Bill Bejeck
> Fix For: 0.10.2.0
>
>
> JMH is a Java harness for building, running, and analyzing benchmarks written 
> in Java or JVM languages.  To run properly JMH needs to be in it's own 
> module.   This task will also investigate using the jmh -gradle pluging 
> [https://github.com/melix/jmh-gradle-plugin] which enables the use of JMH 
> from gradle.  This is related to 
> [https://issues.apache.org/jira/browse/KAFKA-3973]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2334) Prevent HW from going back during leader failover

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2334:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Prevent HW from going back during leader failover 
> --
>
> Key: KAFKA-2334
> URL: https://issues.apache.org/jira/browse/KAFKA-2334
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.8.2.1
>Reporter: Guozhang Wang
>Assignee: Neha Narkhede
> Fix For: 0.10.2.0
>
>
> Consider the following scenario:
> 0. Kafka use replication factor of 2, with broker B1 as the leader, and B2 as 
> the follower. 
> 1. A producer keep sending to Kafka with ack=-1.
> 2. A consumer repeat issuing ListOffset request to Kafka.
> And the following sequence:
> 0. B1 current log-end-offset (LEO) 0, HW-offset 0; and same with B2.
> 1. B1 receive a ProduceRequest of 100 messages, append to local log (LEO 
> becomes 100) and hold the request in purgatory.
> 2. B1 receive a FetchRequest starting at offset 0 from follower B2, and 
> returns the 100 messages.
> 3. B2 append its received message to local log (LEO becomes 100).
> 4. B1 receive another FetchRequest starting at offset 100 from B2, knowing 
> that B2's LEO has caught up to 100, and hence update its own HW, and 
> satisfying the ProduceRequest in purgatory, and sending the FetchResponse 
> with HW 100 back to B2 ASYNCHRONOUSLY.
> 5. B1 successfully sends the ProduceResponse to the producer, and then fails, 
> hence the FetchResponse did not reach B2, whose HW remains 0.
> From the consumer's point of view, it could first see the latest offset of 
> 100 (from B1), and then see the latest offset of 0 (from B2), and then the 
> latest offset gradually catch up to 100.
> This is because we use HW to guard the ListOffset and 
> Fetch-from-ordinary-consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1063) run log cleanup at startup

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1063:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> run log cleanup at startup
> --
>
> Key: KAFKA-1063
> URL: https://issues.apache.org/jira/browse/KAFKA-1063
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0
>Reporter: paul mackles
>Assignee: Neha Narkhede
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> Jun suggested I file this ticket to have the brokers start running cleanup at 
> start. Here is the scenario that precipitated it:
> We ran into a situation on our dev cluster (3 nodes, v0.8) where we ran out 
> of disk on one of the nodes . As expected, the broker shut itself down and 
> all of the clients switched over to the other nodes. So far so good. 
> To free up disk space, I reduced log.retention.hours to something more 
> manageable (from 172 to 12). I did this on all 3 nodes. Since the other 2 
> nodes were running OK, I first tried to restart the node which ran out of 
> disk. Unfortunately, it kept shutting itself down due to the full disk. From 
> the logs, I think this was because it was trying to sync-up the replicas it 
> was responsible for and of course couldn't due to the lack of disk space. My 
> hope was that upon restart, it would see the new retention settings and free 
> up a bunch of disk space before trying to do any syncs.
> I then went and restarted the other 2 nodes. They both picked up the new 
> retention settings and freed up a bunch of storage as a result. I then went 
> back and tried to restart the 3rd node but to no avail. It still had problems 
> with the full disks.
> I thought about trying to reassign partitions so that the node in question 
> had less to manage but that turned out to be a hassle so I wound up manually 
> deleting some of the old log/segment files. The broker seemed to come back 
> fine after that but that's not something I would want to do on a production 
> server.
> We obviously need better monitoring/alerting to avoid this situation 
> altogether, but I am wondering if the order of operations at startup 
> could/should be changed to better account for scenarios like this. Or maybe a 
> utility to remove old logs after changing ttl? Did I miss a better way to 
> handle this?
> Original email thread is here:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201309.mbox/%3cce6365ae.82d66%25pmack...@adobe.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4011) allow sizing RequestQueue in bytes

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-4011:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> allow sizing RequestQueue in bytes
> --
>
> Key: KAFKA-4011
> URL: https://issues.apache.org/jira/browse/KAFKA-4011
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: radai rosenblatt
> Fix For: 0.10.2.0
>
>
> currently RequestChannel's requestQueue is sized in number of requests:
> {code:title=RequestChannel.scala|borderStyle=solid}
> private val requestQueue = new 
> ArrayBlockingQueue[RequestChannel.Request](queueSize)
> {code}
> under the assumption that the end goal is a bound on server memory 
> consumption, this requires the admin to know the avg request size.
> I would like to propose sizing the requestQueue not by number of requests, 
> but by their accumulated size (Request.buffer.capacity). this would probably 
> make configuring and sizing an instance easier.
> there would need to be a new configuration setting for this 
> (queued.max.bytes?) - which could be either in addition to or instead of the 
> current queued.max.requests setting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3719) Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too narrow

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3719:
---
Status: Resolved  (was: Closed)

> Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too 
> narrow
> -
>
> Key: KAFKA-3719
> URL: https://issues.apache.org/jira/browse/KAFKA-3719
> Project: Kafka
>  Issue Type: Bug
>Reporter: Balazs Kossovics
>Assignee: Ryan P
>Priority: Trivial
> Fix For: 0.10.1.0
>
>
> In our continuous integration environment the Kafka brokers run on hosts 
> containing underscores in their names. The current regex splits incorrectly 
> these names into host and port parts.
> I could submit a pull request if someone confirms that this is indeed a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KAFKA-3719) Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too narrow

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson closed KAFKA-3719.
--

> Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too 
> narrow
> -
>
> Key: KAFKA-3719
> URL: https://issues.apache.org/jira/browse/KAFKA-3719
> Project: Kafka
>  Issue Type: Bug
>Reporter: Balazs Kossovics
>Assignee: Ryan P
>Priority: Trivial
> Fix For: 0.10.1.0
>
>
> In our continuous integration environment the Kafka brokers run on hosts 
> containing underscores in their names. The current regex splits incorrectly 
> these names into host and port parts.
> I could submit a pull request if someone confirms that this is indeed a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-984) Avoid a full rebalance in cases when a new topic is discovered but container/broker set stay the same

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-984:
--
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Avoid a full rebalance in cases when a new topic is discovered but 
> container/broker set stay the same
> -
>
> Key: KAFKA-984
> URL: https://issues.apache.org/jira/browse/KAFKA-984
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.10.2.0
>
> Attachments: KAFKA-984.v1.patch, KAFKA-984.v2.patch, 
> KAFKA-984.v2.patch
>
>
> Currently a full rebalance will be triggered on high level consumers even 
> when just a new topic is added to ZK. Better avoid this behavior but only 
> rebalance on this newly added topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-806) Index may not always observe log.index.interval.bytes

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-806:
--
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Index may not always observe log.index.interval.bytes
> -
>
> Key: KAFKA-806
> URL: https://issues.apache.org/jira/browse/KAFKA-806
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Jun Rao
> Fix For: 0.10.2.0
>
>
> Currently, each log.append() will add at most 1 index entry, even when the 
> appended data is larger than log.index.interval.bytes. One potential issue is 
> that if a follower restarts after being down for a long time, it may fetch 
> data much bigger than log.index.interval.bytes at a time. This means that 
> fewer index entries are created, which can increase the fetch time from the 
> consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1342) Slow controlled shutdowns can result in stale shutdown requests

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-1342:
---
Priority: Critical  (was: Blocker)

> Slow controlled shutdowns can result in stale shutdown requests
> ---
>
> Key: KAFKA-1342
> URL: https://issues.apache.org/jira/browse/KAFKA-1342
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Joel Koshy
>Assignee: Joel Koshy
>Priority: Critical
>  Labels: newbie++, newbiee
> Fix For: 0.10.1.0
>
>
> I don't think this is a bug introduced in 0.8.1., but triggered by the fact
> that controlled shutdown seems to have become slower in 0.8.1 (will file a
> separate ticket to investigate that). When doing a rolling bounce, it is
> possible for a bounced broker to stop all its replica fetchers since the
> previous PID's shutdown requests are still being shutdown.
> - 515 is the controller
> - Controlled shutdown initiated for 503
> - Controller starts controlled shutdown for 503
> - The controlled shutdown takes a long time in moving leaders and moving
>   follower replicas on 503 to the offline state.
> - So 503's read from the shutdown channel times out and a new channel is
>   created. It issues another shutdown request.  This request (since it is a
>   new channel) is accepted at the controller's socket server but then waits
>   on the broker shutdown lock held by the previous controlled shutdown which
>   is still in progress.
> - The above step repeats for the remaining retries (six more requests).
> - 503 hits SocketTimeout exception on reading the response of the last
>   shutdown request and proceeds to do an unclean shutdown.
> - The controller's onBrokerFailure call-back fires and moves 503's replicas
>   to offline (not too important in this sequence).
> - 503 is brought back up.
> - The controller's onBrokerStartup call-back fires and moves its replicas
>   (and partitions) to online state. 503 starts its replica fetchers.
> - Unfortunately, the (phantom) shutdown requests are still being handled and
>   the controller sends StopReplica requests to 503.
> - The first shutdown request finally finishes (after 76 minutes in my case!).
> - The remaining shutdown requests also execute and do the same thing (sends
>   StopReplica requests for all partitions to
>   503).
> - The remaining requests complete quickly because they end up not having to
>   touch zookeeper paths - no leaders left on the broker and no need to
>   shrink ISR in zookeeper since it has already been done by the first
>   shutdown request.
> - So in the end-state 503 is up, but effectively idle due to the previous
>   PID's shutdown requests.
> There are some obvious fixes that can be made to controlled shutdown to help
> address the above issue. E.g., we don't really need to move follower
> partitions to Offline. We did that as an "optimization" so the broker falls
> out of ISR sooner - which is helpful when producers set required.acks to -1.
> However it adds a lot of latency to controlled shutdown. Also, (more
> importantly) we should have a mechanism to abort any stale shutdown process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1342) Slow controlled shutdowns can result in stale shutdown requests

2016-09-26 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15525019#comment-15525019
 ] 

Jason Gustafson commented on KAFKA-1342:


Downgrading this to "Critical" since it has not actually blocked the previous 
few releases and after discussion with [~junrao], it seems unlikely to be fixed 
for 0.10.1.

> Slow controlled shutdowns can result in stale shutdown requests
> ---
>
> Key: KAFKA-1342
> URL: https://issues.apache.org/jira/browse/KAFKA-1342
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Joel Koshy
>Assignee: Joel Koshy
>Priority: Blocker
>  Labels: newbie++, newbiee
> Fix For: 0.10.1.0
>
>
> I don't think this is a bug introduced in 0.8.1., but triggered by the fact
> that controlled shutdown seems to have become slower in 0.8.1 (will file a
> separate ticket to investigate that). When doing a rolling bounce, it is
> possible for a bounced broker to stop all its replica fetchers since the
> previous PID's shutdown requests are still being shutdown.
> - 515 is the controller
> - Controlled shutdown initiated for 503
> - Controller starts controlled shutdown for 503
> - The controlled shutdown takes a long time in moving leaders and moving
>   follower replicas on 503 to the offline state.
> - So 503's read from the shutdown channel times out and a new channel is
>   created. It issues another shutdown request.  This request (since it is a
>   new channel) is accepted at the controller's socket server but then waits
>   on the broker shutdown lock held by the previous controlled shutdown which
>   is still in progress.
> - The above step repeats for the remaining retries (six more requests).
> - 503 hits SocketTimeout exception on reading the response of the last
>   shutdown request and proceeds to do an unclean shutdown.
> - The controller's onBrokerFailure call-back fires and moves 503's replicas
>   to offline (not too important in this sequence).
> - 503 is brought back up.
> - The controller's onBrokerStartup call-back fires and moves its replicas
>   (and partitions) to online state. 503 starts its replica fetchers.
> - Unfortunately, the (phantom) shutdown requests are still being handled and
>   the controller sends StopReplica requests to 503.
> - The first shutdown request finally finishes (after 76 minutes in my case!).
> - The remaining shutdown requests also execute and do the same thing (sends
>   StopReplica requests for all partitions to
>   503).
> - The remaining requests complete quickly because they end up not having to
>   touch zookeeper paths - no leaders left on the broker and no need to
>   shrink ISR in zookeeper since it has already been done by the first
>   shutdown request.
> - So in the end-state 503 is up, but effectively idle due to the previous
>   PID's shutdown requests.
> There are some obvious fixes that can be made to controlled shutdown to help
> address the above issue. E.g., we don't really need to move follower
> partitions to Offline. We did that as an "optimization" so the broker falls
> out of ISR sooner - which is helpful when producers set required.acks to -1.
> However it adds a lot of latency to controlled shutdown. Also, (more
> importantly) we should have a mechanism to abort any stale shutdown process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1901: MINOR: Fix Javadoc for KafkaConsumer.poll

2016-09-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1901


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


?????? it this a bug? - message disorder in async send mode -- 0.9.0java client sdk InFlightRequests

2016-09-26 Thread ????????
hi, Json
I know the server logic:  the server use  selector mute/unmute ,  whenever the 
socket receive a request,  it will
mute until the response return, it become unmute.






--  --
??: "";;
: 2016??9??27??(??) 11:57
??: "dev"; 

: ?? it this a bug? - message disorder in async send mode -- 0.9.0java 
client sdk InFlightRequests



hi, Jason
can you explain the "head of line request blocking" in more detail?   I am very 
curious, thanks! 


below is the code:


class RequestChannel(val numProcessors: Int, val queueSize: Int) extends 
KafkaMetricsGroup {
  private var responseListeners: List[(Int) => Unit] = Nil
  private val requestQueue = new 
ArrayBlockingQueue[RequestChannel.Request](queueSize)
  private val responseQueues = new 
Array[BlockingQueue[RequestChannel.Response]](numProcessors)
  for(i <- 0 until numProcessors)
responseQueues(i) = new LinkedBlockingQueue[RequestChannel.Response]()
the requestQueue is consumed by multiple threads,  so how it can guarantee the 
response order the same as the request order? 


--  --
??: "Jason Gustafson";;
: 2016??9??27??(??) 2:11
??: "dev"; 

: Re: it this a bug? - message disorder in async send mode -- 0.9.0java 
client sdk InFlightRequests



Hi there,

The Kafka server implements head of line request blocking, which means that
it will only handle one request a time from a given socket. That means that
the responses will always be returned in the same order as the requests
were sent.

-Jason

On Sat, Sep 24, 2016 at 1:19 AM,   wrote:

> We know that in the async send mode, kafka do not guarantee the message
> order even for  the same partition.
>
>
> That is, if we send 3 request  ( the same topic, the same partition)  to a
> kafka server in the async mode,
> the send order is 1, 2, 3 (correlation id is 1, 2, 3),  while the kafka
> server maybe save the 3 request in the log by the order 3, 2, 1,  and
> return to the client by the order 2, 3, 1??
>
>
> This happens because Kafka server processes requests with multi
> threads(multi KafkaRequestHandler).
>
>
> If the above is true,  below in the 0.9.0 java client idk maybe has
> problem:
>
>
> In the class NetworkClient,  there is a collection inFlightRequests to
> maintain all the in flight request:
>
>
> private final InFlightRequests inFlightRequests;
> final class InFlightRequests {
>
> private final int maxInFlightRequestsPerConnection;
> private final Map requests = new
> HashMap();...}
> It use a Deque to maintain the in flight requests whose response has not
> come back.
> Whenever we send a request, we will enqueue the request,  and when the
> response come back, we will dequeue the request.
> private void doSend(ClientRequest request, long now) {
> request.setSendTimeMs(now);
> this.inFlightRequests.add(request);
> selector.send(request.request());
> }private void handleCompletedReceives(List responses,
> long now) {
> for (NetworkReceive receive : this.selector.completedReceives()) {
> String source = receive.source();
> ClientRequest req = inFlightRequests.completeNext(source);
> ResponseHeader header = ResponseHeader.parse(receive.payload());
> // Always expect the response version id to be the same as the
> request version id
> short apiKey = req.request().header().apiKey();
> short apiVer = req.request().header().apiVersion();
> Struct body = (Struct) ProtoUtils.responseSchema(apiKey,
> apiVer).read(receive.payload());
> correlate(req.request().header(), header);
> if (!metadataUpdater.maybeHandleCompletedReceive(req, now, body))
> responses.add(new ClientResponse(req, now, false, body));
> }
> }
> but if the request order and the response order does not match,  is it the
> Deque suitable?  or it should be use a Map to maintain the request?
> By the way, in the above,  there is a function correlate(xxx) to check the
> match, if not match,  it will throw a exception.private void
> correlate(RequestHeader requestHeader, ResponseHeader responseHeader) {
> if (requestHeader.correlationId() != responseHeader.correlationId())
> throw new IllegalStateException("Correlation id for response (" +
> responseHeader.correlationId()
> + ") does not match request (" +
> requestHeader.correlationId() + ")");
> }
> But in the async mode,  as mentioned above,  the mismatch is normal, and
> likely happen.
> So here is it enough to process the problem by just throwing an exception ?

Build failed in Jenkins: kafka-0.10.1-jdk7 #12

2016-09-26 Thread Apache Jenkins Server
See 

Changes:

[junrao] KAFKA-4214; kafka-reassign-partitions fails all the time when brokers

--
[...truncated 11742 lines...]
:523:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
  if (offsetAndMetadata.expireTimestamp == 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP)

^
:311:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
  if (partitionData.timestamp == 
OffsetCommitRequest.DEFAULT_TIMESTAMP)
 ^
:93:
 class ProducerConfig in package producer is deprecated: This class has been 
deprecated and will be removed in a future release. Please use 
org.apache.kafka.clients.producer.ProducerConfig instead.
val producerConfig = new ProducerConfig(props)
 ^
:94:
 method fetchTopicMetadata in object ClientUtils is deprecated: This method has 
been deprecated and will be removed in a future release.
fetchTopicMetadata(topics, brokers, producerConfig, correlationId)
^
:393:
 constructor UpdateMetadataRequest in class UpdateMetadataRequest is 
deprecated: see corresponding Javadoc for more information.
new UpdateMetadataRequest(controllerId, controllerEpoch, 
liveBrokers.asJava, partitionStates.asJava)
^
:191:
 object ProducerRequestStatsRegistry in package producer is deprecated: This 
object has been deprecated and will be removed in a future release.
ProducerRequestStatsRegistry.removeProducerRequestStats(clientId)
^
:129:
 method readFromReadableChannel in class NetworkReceive is deprecated: see 
corresponding Javadoc for more information.
  response.readFromReadableChannel(channel)
   ^
:311:
 value timestamp in class PartitionData is deprecated: see corresponding 
Javadoc for more information.
  if (partitionData.timestamp == 
OffsetCommitRequest.DEFAULT_TIMESTAMP)
^
:314:
 value timestamp in class PartitionData is deprecated: see corresponding 
Javadoc for more information.
offsetRetention + partitionData.timestamp
^
:553:
 method offsetData in class ListOffsetRequest is deprecated: see corresponding 
Javadoc for more information.
val (authorizedRequestInfo, unauthorizedRequestInfo) = 
offsetRequest.offsetData.asScala.partition {
 ^
:553:
 class PartitionData in object ListOffsetRequest is deprecated: see 
corresponding Javadoc for more information.
val (authorizedRequestInfo, unauthorizedRequestInfo) = 
offsetRequest.offsetData.asScala.partition {

  ^
:558:
 constructor PartitionData in class PartitionData is deprecated: see 
corresponding Javadoc for more information.
  new PartitionData(Errors.TOPIC_AUTHORIZATION_FAILED.code, 
List[JLong]().asJava)
  ^
:583:
 constructor PartitionData in class PartitionData is deprecated: see 
corresponding Javadoc for more information.
(topicPartition, new ListOffsetResponse.PartitionData(Errors.NONE.code, 
offsets.map(new JLong(_)).asJava))
 ^

?????? it this a bug? - message disorder in async send mode -- 0.9.0java client sdk InFlightRequests

2016-09-26 Thread ????????
hi, Jason
can you explain the "head of line request blocking" in more detail?   I am very 
curious, thanks! 


below is the code:


class RequestChannel(val numProcessors: Int, val queueSize: Int) extends 
KafkaMetricsGroup {
  private var responseListeners: List[(Int) => Unit] = Nil
  private val requestQueue = new 
ArrayBlockingQueue[RequestChannel.Request](queueSize)
  private val responseQueues = new 
Array[BlockingQueue[RequestChannel.Response]](numProcessors)
  for(i <- 0 until numProcessors)
responseQueues(i) = new LinkedBlockingQueue[RequestChannel.Response]()
the requestQueue is consumed by multiple threads,  so how it can guarantee the 
response order the same as the request order? 


--  --
??: "Jason Gustafson";;
: 2016??9??27??(??) 2:11
??: "dev"; 

: Re: it this a bug? - message disorder in async send mode -- 0.9.0java 
client sdk InFlightRequests



Hi there,

The Kafka server implements head of line request blocking, which means that
it will only handle one request a time from a given socket. That means that
the responses will always be returned in the same order as the requests
were sent.

-Jason

On Sat, Sep 24, 2016 at 1:19 AM,   wrote:

> We know that in the async send mode, kafka do not guarantee the message
> order even for  the same partition.
>
>
> That is, if we send 3 request  ( the same topic, the same partition)  to a
> kafka server in the async mode,
> the send order is 1, 2, 3 (correlation id is 1, 2, 3),  while the kafka
> server maybe save the 3 request in the log by the order 3, 2, 1,  and
> return to the client by the order 2, 3, 1??
>
>
> This happens because Kafka server processes requests with multi
> threads(multi KafkaRequestHandler).
>
>
> If the above is true,  below in the 0.9.0 java client idk maybe has
> problem:
>
>
> In the class NetworkClient,  there is a collection inFlightRequests to
> maintain all the in flight request:
>
>
> private final InFlightRequests inFlightRequests;
> final class InFlightRequests {
>
> private final int maxInFlightRequestsPerConnection;
> private final Map requests = new
> HashMap();...}
> It use a Deque to maintain the in flight requests whose response has not
> come back.
> Whenever we send a request, we will enqueue the request,  and when the
> response come back, we will dequeue the request.
> private void doSend(ClientRequest request, long now) {
> request.setSendTimeMs(now);
> this.inFlightRequests.add(request);
> selector.send(request.request());
> }private void handleCompletedReceives(List responses,
> long now) {
> for (NetworkReceive receive : this.selector.completedReceives()) {
> String source = receive.source();
> ClientRequest req = inFlightRequests.completeNext(source);
> ResponseHeader header = ResponseHeader.parse(receive.payload());
> // Always expect the response version id to be the same as the
> request version id
> short apiKey = req.request().header().apiKey();
> short apiVer = req.request().header().apiVersion();
> Struct body = (Struct) ProtoUtils.responseSchema(apiKey,
> apiVer).read(receive.payload());
> correlate(req.request().header(), header);
> if (!metadataUpdater.maybeHandleCompletedReceive(req, now, body))
> responses.add(new ClientResponse(req, now, false, body));
> }
> }
> but if the request order and the response order does not match,  is it the
> Deque suitable?  or it should be use a Map to maintain the request?
> By the way, in the above,  there is a function correlate(xxx) to check the
> match, if not match,  it will throw a exception.private void
> correlate(RequestHeader requestHeader, ResponseHeader responseHeader) {
> if (requestHeader.correlationId() != responseHeader.correlationId())
> throw new IllegalStateException("Correlation id for response (" +
> responseHeader.correlationId()
> + ") does not match request (" +
> requestHeader.correlationId() + ")");
> }
> But in the async mode,  as mentioned above,  the mismatch is normal, and
> likely happen.
> So here is it enough to process the problem by just throwing an exception ?

[jira] [Assigned] (KAFKA-4165) Add 0.10.0.1 as a source for compatibility tests where relevant

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-4165:
--

Assignee: Jason Gustafson

> Add 0.10.0.1 as a source for compatibility tests where relevant
> ---
>
> Key: KAFKA-4165
> URL: https://issues.apache.org/jira/browse/KAFKA-4165
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> We have a few compatibility tests: message_format_change_test.py, 
> compatibility_test_new_broker_test.py, upgrade_test.py that don't currently 
> test with 0.10.0.1 as the source. We will probably update upgrade_test as 
> part of the cluster id work, but we need to check if any other ones need to 
> be updated too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3965) Mirror maker sync send fails will lose messages

2016-09-26 Thread NieWang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524965#comment-15524965
 ] 

NieWang commented on KAFKA-3965:


The mirror maker version that I running is 0.10.0.0.

You can follow the steps I described to verify the problem.

> Mirror maker sync send fails will lose messages
> ---
>
> Key: KAFKA-3965
> URL: https://issues.apache.org/jira/browse/KAFKA-3965
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: NieWang
> Fix For: 0.10.1.0
>
>
> 1、Source kafka cluster storage some messages and message size is 500 bytes.
> 2、Mirror maker producer config 
>  producer.type=sync
>  max.request.size=400
> 3、Start mirror maker backup message from source kafka cluster to destination 
> kafka cluster, then mirror maker will quit because message larger than 400.
> 4、Check source kafka cluster will find offset have set to 1.
> 5、Check destination kafka cluster, then find have not any message.
> 6、Delete producer config max.request.size=400,start mirror maker again. Wait 
> mirror maker finish backup, then find have lose the first message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4222) Transient failure in QueryableStateIntegrationTest.queryOnRebalance

2016-09-26 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-4222:
--

 Summary: Transient failure in 
QueryableStateIntegrationTest.queryOnRebalance
 Key: KAFKA-4222
 URL: https://issues.apache.org/jira/browse/KAFKA-4222
 Project: Kafka
  Issue Type: Bug
  Components: streams, unit tests
Reporter: Jason Gustafson
Assignee: Guozhang Wang
 Fix For: 0.10.1.0


Seen here: https://builds.apache.org/job/kafka-trunk-jdk8/915/console

{code}
org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] FAILED
java.lang.AssertionError: Condition not met within timeout 3. waiting 
for metadata, store and value to be non null
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:276)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.verifyAllKVKeys(QueryableStateIntegrationTest.java:263)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:342)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1914: MINOR: Make new consumer default for Mirror Maker

2016-09-26 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1914

MINOR: Make new consumer default for Mirror Maker



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka mm-default-new-consumer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1914.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1914


commit e58dceb16c0603970a8116d1c53cea51bad4817e
Author: Jason Gustafson 
Date:   2016-09-27T03:35:01Z

MINOR: Make new consumer default for Mirror Maker




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-73: Replication Quotas

2016-09-26 Thread Mayuresh Gharat
Hi Ben,

Just for my understanding :

When you said :

What we really want to do is apply:
LeaderThrottle [104,107,109]
FollowerThrottle [105,113]

---> This means that we want to apply leader throttle to only replica that
is the leader out of 104, 107, 109 right?

Also when you said :

[104,107,109,105,113] which will mean the regular replication traffic
between (say 107 is leader) 107->104 and 107->109 will be throttled also.
--- > By regular traffic you mean traffic for partition 0, right?
Also, the only reason 107->104 and 107->109 would get throttled is because
of follower throttle right?

Thanks,

Mayuresh

On Mon, Sep 26, 2016 at 7:13 PM, Neha Narkhede  wrote:

> Makes sense to me. Thanks Ben!
>
> On Mon, Sep 26, 2016 at 9:39 AM Jun Rao  wrote:
>
> > Yes, this change makes sense to me since it gives the admin better
> control.
> >
> > Thanks,
> >
> > Jun
> >
> > On Sun, Sep 25, 2016 at 7:35 AM, Ben Stopford  wrote:
> >
> > > Hi All
> > >
> > > We’ve made an adjustment to KIP-73: Replication Quotas which I’d like
> to
> > > open up to the community for approval.
> > >
> > > Previously the admin passed a list of replicas to be throttled:
> > >
> > > quota.replication.throttled.replicas =
> > > [partId]:[replica],[partId]:[replica],[partId]:[replica] etc
> > >
> > > The change is to split this into two properties. One that corresponds
> to
> > > the leader-side throttle, and the other that corresponds to the
> > > follower-side throttle.
> > >
> > > quota.leader.replication.throttled.replicas =
> > > [partId]:[replica],[partId]:[replica],[partId]:[replica]
> > > quota.follower.replication.throttled.replicas =
> > > [partId]:[replica],[partId]:[replica],[partId]:[replica]
> > >
> > > This provides more control over the throttling process. It also helps
> us
> > > with a common use case which I’ve described below, for those
> interested.
> > >
> > > Please reply if you have any comments / issues / suggestions.
> > >
> > > Thanks as ever.
> > >
> > > Ben
> > >
> > >
> > > Sample Use Case:
> > >
> > > Say we are moving partition 0. It has replicas [104,107,109] moving to
> > > [105,107,113]
> > >
> > > So the leader could be any of [104,107,109] and we know we will be
> > > creating new replicas on 105 & 113.
> > >
> > > In the current mechanism, all we can do is apply both leader and
> follower
> > > throttles to all 5:  [104,107,109,105,113] which will mean the regular
> > > replication traffic between (say 107 is leader) 107->104 and 107->109
> > will
> > > be throttled also. This is potentially problematic.
> > >
> > > What we really want to do is apply:
> > >
> > > LeaderThrottle [104,107,109]
> > > FollowerThrottle [105,113]
> > >
> > > This way, during a rebalance, we that standard replication traffic will
> > > not be  throttled, but the rebalance will perform correctly if leaders
> > > move. One subtlety is that, should the leader move to the “move
> > > destination” node, it would no longer be throttled. But this is
> actually
> > to
> > > our benefit in the normal case.
> > >
> > >
> >
> --
> Thanks,
> Neha
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Build failed in Jenkins: kafka-trunk-jdk8 #915

2016-09-26 Thread Apache Jenkins Server
See 

Changes:

[jason] KAFKA-3831; Prepare for updating new-consumer-based Mirror Maker's

--
[...truncated 13543 lines...]
org.apache.kafka.streams.processor.internals.StreamThreadTest > testMaybeCommit 
PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testPartitionAssignmentChange STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testPartitionAssignmentChange PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > testMaybeClean 
STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > testMaybeClean 
PASSED

org.apache.kafka.streams.processor.internals.MinTimestampTrackerTest > 
testTracking STARTED

org.apache.kafka.streams.processor.internals.MinTimestampTrackerTest > 
testTracking PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldThrowExceptionIfApplicationServerConfigIsNotHostPortPair STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldThrowExceptionIfApplicationServerConfigIsNotHostPortPair PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldMapUserEndPointToTopicPartitions STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldMapUserEndPointToTopicPartitions PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldAddUserDefinedEndPointToSubscription STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldAddUserDefinedEndPointToSubscription PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithStandbyReplicas STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithStandbyReplicas PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithNewTasks STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithNewTasks PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldExposeHostStateToTopicPartitionsOnAssignment STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldExposeHostStateToTopicPartitionsOnAssignment PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithStates STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithStates PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldThrowExceptionIfApplicationServerConfigPortIsNotAnInteger STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldThrowExceptionIfApplicationServerConfigPortIsNotAnInteger PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldReturnEmptyClusterMetadataIfItHasntBeenBuilt STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldReturnEmptyClusterMetadataIfItHasntBeenBuilt PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignBasic STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignBasic PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testOnAssignment STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testOnAssignment PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithInternalTopicThatsSourceIsAnotherInternalTopic STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithInternalTopicThatsSourceIsAnotherInternalTopic PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testSubscription STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testSubscription PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldSetClusterMetadataOnAssignment STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldSetClusterMetadataOnAssignment PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithInternalTopics STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithInternalTopics PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotAllowNullProcessorSupplier STARTED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotAllowNullProcessorSupplier PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotSetApplicationIdToNull STARTED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotSetApplicationIdToNull PASSED


[jira] [Commented] (KAFKA-3175) topic not accessible after deletion even when delete.topic.enable is disabled

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524915#comment-15524915
 ] 

ASF GitHub Bot commented on KAFKA-3175:
---

GitHub user MayureshGharat opened a pull request:

https://github.com/apache/kafka/pull/1913

KAFKA-3175 (Rebased) : topic not accessible after deletion even when 
delete.topic.enable is disabled

Rebased the patch with current trunk.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MayureshGharat/kafka kafka-3175_latest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1913.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1913


commit 3b4412eb948318ac8fbea855ff52071bbb6f8fb3
Author: MayureshGharat 
Date:   2016-02-01T21:51:57Z

Remove topics under /admin/delete_topics path in zk if deleteTopic is 
disabled. The topic should never be enqueued for deletion

commit 1bf29ae00da0eaa0044bdc1bb8dc6cc80ee90951
Author: MayureshGharat 
Date:   2016-03-11T17:25:32Z

Addressed Jun's comment to clean the zk state for a topic on cluster 
restart and delete topic disabled




> topic not accessible after deletion even when delete.topic.enable is disabled
> -
>
> Key: KAFKA-3175
> URL: https://issues.apache.org/jira/browse/KAFKA-3175
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Mayuresh Gharat
> Fix For: 0.10.1.0
>
>
> The can be reproduced with the following steps.
> 1. start ZK and 1 broker (with default delete.topic.enable=false)
> 2. create a topic test
> bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test 
> --partition 1 --replication-factor 1
> 3. delete topic test
> bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test
> 4. restart the broker
> Now topic test still shows up during topic description.
> bin/kafka-topics.sh --zookeeper localhost:2181 --describe
> Topic:testPartitionCount:1ReplicationFactor:1 Configs:
>   Topic: test Partition: 0Leader: 0   Replicas: 0 Isr: 0
> However, one can't produce to this topic any more.
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> [2016-01-29 17:55:24,527] WARN Error while fetching metadata with correlation 
> id 0 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
> [2016-01-29 17:55:24,725] WARN Error while fetching metadata with correlation 
> id 1 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
> [2016-01-29 17:55:24,828] WARN Error while fetching metadata with correlation 
> id 2 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1913: KAFKA-3175 (Rebased) : topic not accessible after ...

2016-09-26 Thread MayureshGharat
GitHub user MayureshGharat opened a pull request:

https://github.com/apache/kafka/pull/1913

KAFKA-3175 (Rebased) : topic not accessible after deletion even when 
delete.topic.enable is disabled

Rebased the patch with current trunk.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MayureshGharat/kafka kafka-3175_latest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1913.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1913


commit 3b4412eb948318ac8fbea855ff52071bbb6f8fb3
Author: MayureshGharat 
Date:   2016-02-01T21:51:57Z

Remove topics under /admin/delete_topics path in zk if deleteTopic is 
disabled. The topic should never be enqueued for deletion

commit 1bf29ae00da0eaa0044bdc1bb8dc6cc80ee90951
Author: MayureshGharat 
Date:   2016-03-11T17:25:32Z

Addressed Jun's comment to clean the zk state for a topic on cluster 
restart and delete topic disabled




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk8 #914

2016-09-26 Thread Apache Jenkins Server
See 

Changes:

[junrao] KAFKA-4214; kafka-reassign-partitions fails all the time when brokers

--
[...truncated 6124 lines...]

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testNoConsumeAcl PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testProduceConsumeViaAssign 
STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testProduceConsumeViaAssign 
PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testNoProduceAcl STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testNoProduceAcl PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testNoGroupAcl STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testNoGroupAcl PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeViaSubscribe STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeViaSubscribe PASSED

kafka.api.UserQuotaTest > testProducerConsumerOverrideUnthrottled STARTED

kafka.api.UserQuotaTest > testProducerConsumerOverrideUnthrottled PASSED

kafka.api.UserQuotaTest > testThrottledProducerConsumer STARTED

kafka.api.UserQuotaTest > testThrottledProducerConsumer PASSED

kafka.api.UserQuotaTest > testQuotaOverrideDelete STARTED

kafka.api.UserQuotaTest > testQuotaOverrideDelete PASSED

kafka.api.PlaintextConsumerTest > testEarliestOrLatestOffsets STARTED

kafka.api.PlaintextConsumerTest > testEarliestOrLatestOffsets PASSED

kafka.api.PlaintextConsumerTest > testPartitionsForAutoCreate STARTED

kafka.api.PlaintextConsumerTest > testPartitionsForAutoCreate PASSED

kafka.api.PlaintextConsumerTest > testShrinkingTopicSubscriptions STARTED

kafka.api.PlaintextConsumerTest > testShrinkingTopicSubscriptions PASSED

kafka.api.PlaintextConsumerTest > testMaxPollIntervalMs STARTED

kafka.api.PlaintextConsumerTest > testMaxPollIntervalMs PASSED

kafka.api.PlaintextConsumerTest > testOffsetsForTimes STARTED

kafka.api.PlaintextConsumerTest > testOffsetsForTimes PASSED

kafka.api.PlaintextConsumerTest > testSubsequentPatternSubscription STARTED

kafka.api.PlaintextConsumerTest > testSubsequentPatternSubscription PASSED

kafka.api.PlaintextConsumerTest > testAsyncCommit STARTED

kafka.api.PlaintextConsumerTest > testAsyncCommit PASSED

kafka.api.PlaintextConsumerTest > testLowMaxFetchSizeForRequestAndPartition 
STARTED

kafka.api.PlaintextConsumerTest > testLowMaxFetchSizeForRequestAndPartition 
PASSED

kafka.api.PlaintextConsumerTest > testMultiConsumerSessionTimeoutOnStopPolling 
STARTED

kafka.api.PlaintextConsumerTest > testMultiConsumerSessionTimeoutOnStopPolling 
PASSED

kafka.api.PlaintextConsumerTest > testMaxPollIntervalMsDelayInRevocation STARTED

kafka.api.PlaintextConsumerTest > testMaxPollIntervalMsDelayInRevocation PASSED

kafka.api.PlaintextConsumerTest > testPartitionsForInvalidTopic STARTED

kafka.api.PlaintextConsumerTest > testPartitionsForInvalidTopic PASSED

kafka.api.PlaintextConsumerTest > testPauseStateNotPreservedByRebalance STARTED

kafka.api.PlaintextConsumerTest > testPauseStateNotPreservedByRebalance PASSED

kafka.api.PlaintextConsumerTest > 
testFetchHonoursFetchSizeIfLargeRecordNotFirst STARTED

kafka.api.PlaintextConsumerTest > 
testFetchHonoursFetchSizeIfLargeRecordNotFirst PASSED

kafka.api.PlaintextConsumerTest > testSeek STARTED

kafka.api.PlaintextConsumerTest > testSeek PASSED

kafka.api.PlaintextConsumerTest > testPositionAndCommit STARTED

kafka.api.PlaintextConsumerTest > testPositionAndCommit PASSED

kafka.api.PlaintextConsumerTest > 
testFetchRecordLargerThanMaxPartitionFetchBytes STARTED

kafka.api.PlaintextConsumerTest > 
testFetchRecordLargerThanMaxPartitionFetchBytes PASSED

kafka.api.PlaintextConsumerTest > testUnsubscribeTopic STARTED

kafka.api.PlaintextConsumerTest > testUnsubscribeTopic PASSED

kafka.api.PlaintextConsumerTest > testMultiConsumerSessionTimeoutOnClose STARTED

kafka.api.PlaintextConsumerTest > testMultiConsumerSessionTimeoutOnClose PASSED

kafka.api.PlaintextConsumerTest > testFetchRecordLargerThanFetchMaxBytes STARTED

kafka.api.PlaintextConsumerTest > testFetchRecordLargerThanFetchMaxBytes PASSED

kafka.api.PlaintextConsumerTest > testMultiConsumerDefaultAssignment STARTED

kafka.api.PlaintextConsumerTest > testMultiConsumerDefaultAssignment PASSED

kafka.api.PlaintextConsumerTest > testAutoCommitOnClose STARTED

kafka.api.PlaintextConsumerTest > testAutoCommitOnClose PASSED

kafka.api.PlaintextConsumerTest > testListTopics STARTED

kafka.api.PlaintextConsumerTest > testListTopics PASSED

kafka.api.PlaintextConsumerTest > testExpandingTopicSubscriptions STARTED
ERROR: Could not install GRADLE_2_4_RC_2_HOME
java.lang.NullPointerException

kafka.api.PlaintextConsumerTest > testExpandingTopicSubscriptions PASSED

kafka.api.PlaintextConsumerTest > testInterceptors STARTED

kafka.api.PlaintextConsumerTest > testInterceptors PASSED

kafka.api.PlaintextConsumerTest > testPatternUnsubscription STARTED


Re: [DISCUSS] KIP-73: Replication Quotas

2016-09-26 Thread Neha Narkhede
Makes sense to me. Thanks Ben!

On Mon, Sep 26, 2016 at 9:39 AM Jun Rao  wrote:

> Yes, this change makes sense to me since it gives the admin better control.
>
> Thanks,
>
> Jun
>
> On Sun, Sep 25, 2016 at 7:35 AM, Ben Stopford  wrote:
>
> > Hi All
> >
> > We’ve made an adjustment to KIP-73: Replication Quotas which I’d like to
> > open up to the community for approval.
> >
> > Previously the admin passed a list of replicas to be throttled:
> >
> > quota.replication.throttled.replicas =
> > [partId]:[replica],[partId]:[replica],[partId]:[replica] etc
> >
> > The change is to split this into two properties. One that corresponds to
> > the leader-side throttle, and the other that corresponds to the
> > follower-side throttle.
> >
> > quota.leader.replication.throttled.replicas =
> > [partId]:[replica],[partId]:[replica],[partId]:[replica]
> > quota.follower.replication.throttled.replicas =
> > [partId]:[replica],[partId]:[replica],[partId]:[replica]
> >
> > This provides more control over the throttling process. It also helps us
> > with a common use case which I’ve described below, for those interested.
> >
> > Please reply if you have any comments / issues / suggestions.
> >
> > Thanks as ever.
> >
> > Ben
> >
> >
> > Sample Use Case:
> >
> > Say we are moving partition 0. It has replicas [104,107,109] moving to
> > [105,107,113]
> >
> > So the leader could be any of [104,107,109] and we know we will be
> > creating new replicas on 105 & 113.
> >
> > In the current mechanism, all we can do is apply both leader and follower
> > throttles to all 5:  [104,107,109,105,113] which will mean the regular
> > replication traffic between (say 107 is leader) 107->104 and 107->109
> will
> > be throttled also. This is potentially problematic.
> >
> > What we really want to do is apply:
> >
> > LeaderThrottle [104,107,109]
> > FollowerThrottle [105,113]
> >
> > This way, during a rebalance, we that standard replication traffic will
> > not be  throttled, but the rebalance will perform correctly if leaders
> > move. One subtlety is that, should the leader move to the “move
> > destination” node, it would no longer be throttled. But this is actually
> to
> > our benefit in the normal case.
> >
> >
>
-- 
Thanks,
Neha


[jira] [Updated] (KAFKA-3831) Preparation for updating the default partition assignment strategy of Mirror Maker to round robin

2016-09-26 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3831:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Preparation for updating the default partition assignment strategy of Mirror 
> Maker to round robin
> -
>
> Key: KAFKA-3831
> URL: https://issues.apache.org/jira/browse/KAFKA-3831
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Minor
>
> Add proper warnings and make necessary doc changes for updating the default 
> partition assignment strategy of Mirror Maker from range to round robin. The 
> actual switch would occur as part of a major release cycle (to be scheduled).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3831) Preparation for updating the default partition assignment strategy of Mirror Maker to round robin

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524791#comment-15524791
 ] 

ASF GitHub Bot commented on KAFKA-3831:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1499


> Preparation for updating the default partition assignment strategy of Mirror 
> Maker to round robin
> -
>
> Key: KAFKA-3831
> URL: https://issues.apache.org/jira/browse/KAFKA-3831
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Minor
>
> Add proper warnings and make necessary doc changes for updating the default 
> partition assignment strategy of Mirror Maker from range to round robin. The 
> actual switch would occur as part of a major release cycle (to be scheduled).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1499: KAFKA-3831: Prepare for updating new-consumer-base...

2016-09-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1499


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk7 #1571

2016-09-26 Thread Apache Jenkins Server
See 

Changes:

[junrao] KAFKA-4214; kafka-reassign-partitions fails all the time when brokers

--
[...truncated 3611 lines...]

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED


Build failed in Jenkins: kafka-0.10.1-jdk7 #11

2016-09-26 Thread Apache Jenkins Server
See 

Changes:

[me] MINOR: Wakeups propagated from commitOffsets in WorkerSinkTask should be

--
[...truncated 7390 lines...]

kafka.network.SocketServerTest > tooBigRequestIsRejected STARTED

kafka.network.SocketServerTest > tooBigRequestIsRejected PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslSslTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.SaslSslTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.SaslSslTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslSslTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.SaslSslTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslSslTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.SaslSslTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslSslTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
STARTED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
PASSED

kafka.integration.UncleanLeaderElectionTest > 
testCleanLeaderElectionDisabledByTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testCleanLeaderElectionDisabledByTopicOverride PASSED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionDisabled 
STARTED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionDisabled 
PASSED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionInvalidTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionInvalidTopicOverride PASSED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionEnabledByTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionEnabledByTopicOverride PASSED

kafka.integration.MinIsrConfigTest > testDefaultKafkaConfig STARTED

kafka.integration.MinIsrConfigTest > testDefaultKafkaConfig PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 

[jira] [Updated] (KAFKA-4201) Add an --assignment-strategy option to new-consumer-based Mirror Maker

2016-09-26 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian updated KAFKA-4201:
---
Status: Patch Available  (was: Open)

> Add an --assignment-strategy option to new-consumer-based Mirror Maker
> --
>
> Key: KAFKA-4201
> URL: https://issues.apache.org/jira/browse/KAFKA-4201
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> The default assignment strategy in mirror maker will be changed from range to 
> round robin in an upcoming release ([see 
> KAFKA-3818|https://issues.apache.org/jira/browse/KAFKA-3818]). In order to 
> make it easier for users to change the assignment strategy, add an 
> {{--assignment-strategy}} option to Mirror Maker command line tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4201) Add an --assignment-strategy option to new-consumer-based Mirror Maker

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524638#comment-15524638
 ] 

ASF GitHub Bot commented on KAFKA-4201:
---

GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/1912

KAFKA-4201: Add an `assignment.strategy` option to new-consumer-based 
Mirror Maker

A command line option `assignment.strategy` is added to new-consumer-based 
Mirror Maker that overwrites any assignment strategy specified in the 
new-consumer config.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-4201

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1912.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1912


commit 502af929addcadf27a734f819e29fc29b8a29264
Author: Vahid Hashemian 
Date:   2016-09-27T00:35:00Z

KAFKA-4201: Add an `assignment.strategy` option to new-consumer-based 
Mirror Maker

A command line option `assignment.strategy` is added to new-consumer-based 
Mirror Maker that overwrites any assignment strategy specified in the 
new-consumer config.




> Add an --assignment-strategy option to new-consumer-based Mirror Maker
> --
>
> Key: KAFKA-4201
> URL: https://issues.apache.org/jira/browse/KAFKA-4201
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> The default assignment strategy in mirror maker will be changed from range to 
> round robin in an upcoming release ([see 
> KAFKA-3818|https://issues.apache.org/jira/browse/KAFKA-3818]). In order to 
> make it easier for users to change the assignment strategy, add an 
> {{--assignment-strategy}} option to Mirror Maker command line tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1912: KAFKA-4201: Add an `assignment.strategy` option to...

2016-09-26 Thread vahidhashemian
GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/1912

KAFKA-4201: Add an `assignment.strategy` option to new-consumer-based 
Mirror Maker

A command line option `assignment.strategy` is added to new-consumer-based 
Mirror Maker that overwrites any assignment strategy specified in the 
new-consumer config.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-4201

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1912.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1912


commit 502af929addcadf27a734f819e29fc29b8a29264
Author: Vahid Hashemian 
Date:   2016-09-27T00:35:00Z

KAFKA-4201: Add an `assignment.strategy` option to new-consumer-based 
Mirror Maker

A command line option `assignment.strategy` is added to new-consumer-based 
Mirror Maker that overwrites any assignment strategy specified in the 
new-consumer config.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk7 #1570

2016-09-26 Thread Apache Jenkins Server
See 

Changes:

[me] MINOR: Wakeups propagated from commitOffsets in WorkerSinkTask should be

--
[...truncated 7403 lines...]

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED


[jira] [Commented] (KAFKA-4214) kafka-reassign-partitions fails all the time when brokers are bounced during reassignment

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524590#comment-15524590
 ] 

ASF GitHub Bot commented on KAFKA-4214:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1910


> kafka-reassign-partitions fails all the time when brokers are bounced during 
> reassignment
> -
>
> Key: KAFKA-4214
> URL: https://issues.apache.org/jira/browse/KAFKA-4214
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 0.10.1.0
>
>
> Due to KAFKA-4204, we never realized that the existing system test for 
> testing reassignment would always fail when brokers were bounced in mid 
> process. This happens reliably, even for topics of varying number of 
> partition and varying replication factors.
> In particular, we see errors like this in the logs when the brokers are 
> bounced: 
> {noformat}
> Status of partition reassignment:
> ERROR: Assigned replicas (1,2) don't match the list of replicas for 
> reassignment (1) for partition [test_topic,2]
> Reassignment of partition [test_topic,1] completed successfully
> Reassignment of partition [test_topic,2] failed
> Reassignment of partition [test_topic,3] completed successfully
> Reassignment of partition [test_topic,0] is still in progress
> {noformat}
> Currently, the tests which bounce brokers during reassignment are disabled 
> until this bug is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4214) kafka-reassign-partitions fails all the time when brokers are bounced during reassignment

2016-09-26 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-4214.

   Resolution: Fixed
Fix Version/s: 0.10.1.0

Issue resolved by pull request 1910
[https://github.com/apache/kafka/pull/1910]

> kafka-reassign-partitions fails all the time when brokers are bounced during 
> reassignment
> -
>
> Key: KAFKA-4214
> URL: https://issues.apache.org/jira/browse/KAFKA-4214
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 0.10.1.0
>
>
> Due to KAFKA-4204, we never realized that the existing system test for 
> testing reassignment would always fail when brokers were bounced in mid 
> process. This happens reliably, even for topics of varying number of 
> partition and varying replication factors.
> In particular, we see errors like this in the logs when the brokers are 
> bounced: 
> {noformat}
> Status of partition reassignment:
> ERROR: Assigned replicas (1,2) don't match the list of replicas for 
> reassignment (1) for partition [test_topic,2]
> Reassignment of partition [test_topic,1] completed successfully
> Reassignment of partition [test_topic,2] failed
> Reassignment of partition [test_topic,3] completed successfully
> Reassignment of partition [test_topic,0] is still in progress
> {noformat}
> Currently, the tests which bounce brokers during reassignment are disabled 
> until this bug is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1910: KAFKA-4214:kafka-reassign-partitions fails all the...

2016-09-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1910


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-trunk-jdk8 #913

2016-09-26 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-3829) Warn that kafka-connect group.id must not conflict with connector names

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524540#comment-15524540
 ] 

ASF GitHub Bot commented on KAFKA-3829:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1911

KAFKA-3829: Ensure valid configuration prior to creating connector



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-3829

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1911.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1911


commit 470d7cdbc76ff53476e314c057d3a8c891bd622a
Author: Jason Gustafson 
Date:   2016-09-26T23:48:19Z

KAFKA-3829: Ensure valid configuration prior to creating connector




> Warn that kafka-connect group.id must not conflict with connector names
> ---
>
> Key: KAFKA-3829
> URL: https://issues.apache.org/jira/browse/KAFKA-3829
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Barry Kaplan
>Assignee: Jason Gustafson
>Priority: Critical
>  Labels: documentation
> Fix For: 0.10.1.0
>
>
> If the group.id value happens to have the same value as a connector names the 
> following error will be issued:
> {quote}
> Attempt to join group connect-elasticsearch-indexer failed due to: The group 
> member's supported protocols are incompatible with those of existing members.
> {quote}
> Maybe the documentation for Distributed Worker Configuration group.id could 
> be worded:
> {quote}
> A unique string that identifies the Connect cluster group this worker belongs 
> to. This value must be different than all connector configuration 'name' 
> properties.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1911: KAFKA-3829: Ensure valid configuration prior to cr...

2016-09-26 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1911

KAFKA-3829: Ensure valid configuration prior to creating connector



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-3829

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1911.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1911


commit 470d7cdbc76ff53476e314c057d3a8c891bd622a
Author: Jason Gustafson 
Date:   2016-09-26T23:48:19Z

KAFKA-3829: Ensure valid configuration prior to creating connector




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4212) Add a key-value store that is a TTL persistent cache

2016-09-26 Thread Elias Levy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524513#comment-15524513
 ] 

Elias Levy commented on KAFKA-4212:
---

More generally, the use case is: I've told a bunch of folks about some property 
of some type of object, and I would like to notify those folks every time that 
property changes for for the specific objects they have asked about during some 
configurable time period since the last time they've asked me about the object.

> Add a key-value store that is a TTL persistent cache
> 
>
> Key: KAFKA-4212
> URL: https://issues.apache.org/jira/browse/KAFKA-4212
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Elias Levy
>Assignee: Guozhang Wang
>
> Some jobs needs to maintain as state a large set of key-values for some 
> period of time.  I.e. they need to maintain a TTL cache of values potentially 
> larger than memory. 
> Currently Kafka Streams provides non-windowed and windowed key-value stores.  
> Neither is an exact fit to this use case.  
> The {{RocksDBStore}}, a {{KeyValueStore}}, stores one value per key as 
> required, but does not support expiration.  The TTL option of RocksDB is 
> explicitly not used.
> The {{RocksDBWindowsStore}}, a {{WindowsStore}}, can expire items via segment 
> dropping, but it stores multiple items per key, based on their timestamp.  
> But this store can be repurposed as a cache by fetching the items in reverse 
> chronological order and returning the first item found.
> KAFKA-2594 introduced a fixed-capacity in-memory LRU caching store, but here 
> we desire a variable-capacity memory-overflowing TTL caching store.
> Although {{RocksDBWindowsStore}} can be repurposed as a cache, it would be 
> useful to have an official and proper TTL cache API and implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2016-09-26 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524480#comment-15524480
 ] 

Jun Rao commented on KAFKA-3919:


[~BigAndy], the error trace in the description of the jira only happens when 
the broker does log recovery due to ungraceful shutdown. So, it seems some 
ungraceful shutdown had occurred. Now, assuming there was no data loss during 
ungraceful shutdown, I am not sure frequent leader election or ISR changes 
themselves can lead to the reported issue.

If there was data loss due to ungraceful shutdown, this issue can happen 
regardless of whether the producer uses ack = 1 or all. KAFKA-1211 in general 
fixes a data loss issue when the leader changes too frequently, but also 
addresses this particular issue. We have made a more concrete proposal in 
KAFKA-1211 for a fix. Perhaps you can see if it makes sense to you and/or 
whether you'd be able to help out on the fix.

> Broker faills to start after ungraceful shutdown due to non-monotonically 
> incrementing offsets in logs
> --
>
> Key: KAFKA-3919
> URL: https://issues.apache.org/jira/browse/KAFKA-3919
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Andy Coates
>
> Hi All,
> I encountered an issue with Kafka following a power outage that saw a 
> proportion of our cluster disappear. When the power came back on several 
> brokers halted on start up with the error:
> {noformat}
>   Fatal error during KafkaServerStartable startup. Prepare to shutdown”
>   kafka.common.InvalidOffsetException: Attempt to append an offset 
> (1239742691) to position 35728 no larger than the last offset appended 
> (1239742822) to 
> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
>   at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
>   at kafka.log.LogSegment.recover(LogSegment.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at kafka.log.Log.loadSegments(Log.scala:160)
>   at kafka.log.Log.(Log.scala:90)
>   at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The only way to recover the brokers was to delete the log files that 
> contained non monotonically incrementing offsets.
> We've spent some time digging through the logs and I feel I may have worked 
> out the sequence of events leading to this issue, (though this is based on 
> some assumptions I've made about the way Kafka is working, which may be 
> wrong).
> First off, we have unclean leadership elections disable. (We did later enable 
> them to help get around some other issues we were having, but this was 
> several hours after this issue manifested), and we're producing to the topic 
> with gzip compression and acks=1
> We looked through the data logs that were causing the brokers to not start. 
> What we found the initial part of the log has monotonically increasing 
> offset, where each compressed batch normally contained one or two records. 
> Then the is a batch that contains many records, whose first records have an 
> offset below the previous batch and whose last record has an offset above the 
> previous batch. Following on from this there continues a period of large 
> batches, with monotonically increasing offsets, and then the log returns to 
> batches with one or two records.
> Our working assumption here is that the period before the offset dip, with 
> the small batches, is pre-outage 

[jira] [Commented] (KAFKA-4207) Partitions stopped after a rapid restart of a broker

2016-09-26 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524443#comment-15524443
 ] 

Joel Koshy commented on KAFKA-4207:
---

I have a KIP draft that has been sitting around for a while. I should be able 
to clean that up and send it out within the next week or so.

> Partitions stopped after a rapid restart of a broker
> 
>
> Key: KAFKA-4207
> URL: https://issues.apache.org/jira/browse/KAFKA-4207
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1, 0.10.0.1
>Reporter: Dustin Cote
>
> Environment:
> 4 Kafka brokers
> 10,000 topics with one partition each, replication factor 3
> Partitions with 4KB data each
> No data being produced or consumed
> Scenario:
> Initiate controlled shutdown on one broker
> Interrupt controlled shutdown prior completion with a SIGKILL
> Start a new broker with the same broker ID as broker that was just killed 
> immediately
> Symptoms:
> After starting the new broker, the other three brokers in the cluster will 
> see under replicated partitions forever for some partitions that are hosted 
> on the broker that was killed and restarted
> Cause:
> Today, the controller sends a StopReplica command for each replica hosted on 
> a broker that has initiated a controlled shutdown.  For a large number of 
> replicas this can take awhile.  When the broker that is doing the controlled 
> shutdown is killed, the StopReplica commands are queued up even though the 
> request queue to the broker is cleared.  When the broker comes back online, 
> the StopReplica commands that were queued, get sent to the broker that just 
> started up.  
> CC: [~junrao] since he's familiar with the scenario seen here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1211) Hold the produce request with ack > 1 in purgatory until replicas' HW has larger than the produce offset

2016-09-26 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1211:
---
Assignee: (was: Guozhang Wang)

> Hold the produce request with ack > 1 in purgatory until replicas' HW has 
> larger than the produce offset
> 
>
> Key: KAFKA-1211
> URL: https://issues.apache.org/jira/browse/KAFKA-1211
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
> Fix For: 0.11.0.0
>
>
> Today during leader failover we will have a weakness period when the 
> followers truncate their data before fetching from the new leader, i.e., 
> number of in-sync replicas is just 1. If during this time the leader has also 
> failed then produce requests with ack >1 that have get responded will still 
> be lost. To avoid this scenario we would prefer to hold the produce request 
> in purgatory until replica's HW has larger than the offset instead of just 
> their end-of-log offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4212) Add a key-value store that is a TTL persistent cache

2016-09-26 Thread Elias Levy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524402#comment-15524402
 ] 

Elias Levy commented on KAFKA-4212:
---

The general use case is the joining of updates to two tables over a limited 
period of time.

Consider a hypothetical monitoring service that that allows clients to query 
the status of nodes.  The application may wish to inform the clients whenever 
the status of a node that they have queried changes, but only if the client has 
queried the status during the past 24 hours and if the last status for a node 
is different from the last status the client received.

To do so the service can consume a stream of client node status queries with 
their responses and node status updates.  From the stream of client node status 
queries the service would maintain a cache of the last status for a node sent 
to to a client such that entries expire after 24 hours.  From the node status 
updates the service would maintain a mapping of node to latest status.

When a client query is received, the service can check on the node status 
mapping to see if there is a newer status, and if there is, generate a 
notification.  When a node status update is received, the service can check the 
last status sent to clients in the cache and generate a notification with the 
new status to all clients that previously queried for a node's status.

As an optimization the mapping of nodes to latest status can also be a cache 
with a TTL, since you don't need to keep the statuses of a nodes that haven't 
changed in more than 24 hours, as you'll never receive a delayed node status 
query to match it against.

Abstractly this is equivalent to a {{KTable}}-{{KTable}} inner join where 
entries in each {{KTable}} expire after some TTL, and where one table has a 
composite primary key (node id and client id on one {{KTable}} vs just node it 
on the other).

It could also be though as a windowed {{KTable}} - {{KTable}} join (although in 
such case records that fall outside the window would never be used and are just 
wasting space), or a windowed {{KStream}}-{{KStream}} join of table updates 
where only the latest updated values are used (i.e. discard updates in the 
window if there is a newer update).  Although, again, these would be joins 
where the primary keys are not identical as one is a composite.

Alas, Kafka Streams does not support windowed {{KTable}}-{{KTable}} joins, 
TTL'ed {{KeyValueStore}} s, or joins across {{KTable}} s and/or {{KStream}} s 
with different keys.

That said, the above service can be implemented by joining the client status 
query and client status updates streams using custom processors and by abusing 
{{WindowStore}}.  {{WindowStore}} can be used as a form of TTL'ed 
{{KeyValueStore}}, as it will drop old values that fall out of its window, and 
by iterating in reverse order and only using the latest value. And since it 
allows you to store multiple values for the same key (node id), you can record 
the node status you handed out to clients (node id key; client id, status, and 
timestamp as value) and then iterate over all of them for a given node id 
keeping only the latest one for each client id when a node status update comes 
in an you perform the join.



> Add a key-value store that is a TTL persistent cache
> 
>
> Key: KAFKA-4212
> URL: https://issues.apache.org/jira/browse/KAFKA-4212
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Elias Levy
>Assignee: Guozhang Wang
>
> Some jobs needs to maintain as state a large set of key-values for some 
> period of time.  I.e. they need to maintain a TTL cache of values potentially 
> larger than memory. 
> Currently Kafka Streams provides non-windowed and windowed key-value stores.  
> Neither is an exact fit to this use case.  
> The {{RocksDBStore}}, a {{KeyValueStore}}, stores one value per key as 
> required, but does not support expiration.  The TTL option of RocksDB is 
> explicitly not used.
> The {{RocksDBWindowsStore}}, a {{WindowsStore}}, can expire items via segment 
> dropping, but it stores multiple items per key, based on their timestamp.  
> But this store can be repurposed as a cache by fetching the items in reverse 
> chronological order and returning the first item found.
> KAFKA-2594 introduced a fixed-capacity in-memory LRU caching store, but here 
> we desire a variable-capacity memory-overflowing TTL caching store.
> Although {{RocksDBWindowsStore}} can be repurposed as a cache, it would be 
> useful to have an official and proper TTL cache API and implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3965) Mirror maker sync send fails will lose messages

2016-09-26 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524370#comment-15524370
 ] 

Jiangjie Qin commented on KAFKA-3965:
-

[~niew37] Could you confirm the mirror maker version you are running? It seems 
the configurations specified was for old producer. In Kafka 0.10.0.0 mirror 
maker only uses new producer. Regarding the configurations to ensure no data 
loss, you may refer to the following link:
http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844

By default, mirror maker has already been using those configurations.

> Mirror maker sync send fails will lose messages
> ---
>
> Key: KAFKA-3965
> URL: https://issues.apache.org/jira/browse/KAFKA-3965
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: NieWang
> Fix For: 0.10.1.0
>
>
> 1、Source kafka cluster storage some messages and message size is 500 bytes.
> 2、Mirror maker producer config 
>  producer.type=sync
>  max.request.size=400
> 3、Start mirror maker backup message from source kafka cluster to destination 
> kafka cluster, then mirror maker will quit because message larger than 400.
> 4、Check source kafka cluster will find offset have set to 1.
> 5、Check destination kafka cluster, then find have not any message.
> 6、Delete producer config max.request.size=400,start mirror maker again. Wait 
> mirror maker finish backup, then find have lose the first message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1342) Slow controlled shutdowns can result in stale shutdown requests

2016-09-26 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524361#comment-15524361
 ] 

Jun Rao commented on KAFKA-1342:


A similar incident was reported in 
https://issues.apache.org/jira/browse/KAFKA-4207.

> Slow controlled shutdowns can result in stale shutdown requests
> ---
>
> Key: KAFKA-1342
> URL: https://issues.apache.org/jira/browse/KAFKA-1342
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Joel Koshy
>Assignee: Joel Koshy
>Priority: Blocker
>  Labels: newbie++, newbiee
> Fix For: 0.10.1.0
>
>
> I don't think this is a bug introduced in 0.8.1., but triggered by the fact
> that controlled shutdown seems to have become slower in 0.8.1 (will file a
> separate ticket to investigate that). When doing a rolling bounce, it is
> possible for a bounced broker to stop all its replica fetchers since the
> previous PID's shutdown requests are still being shutdown.
> - 515 is the controller
> - Controlled shutdown initiated for 503
> - Controller starts controlled shutdown for 503
> - The controlled shutdown takes a long time in moving leaders and moving
>   follower replicas on 503 to the offline state.
> - So 503's read from the shutdown channel times out and a new channel is
>   created. It issues another shutdown request.  This request (since it is a
>   new channel) is accepted at the controller's socket server but then waits
>   on the broker shutdown lock held by the previous controlled shutdown which
>   is still in progress.
> - The above step repeats for the remaining retries (six more requests).
> - 503 hits SocketTimeout exception on reading the response of the last
>   shutdown request and proceeds to do an unclean shutdown.
> - The controller's onBrokerFailure call-back fires and moves 503's replicas
>   to offline (not too important in this sequence).
> - 503 is brought back up.
> - The controller's onBrokerStartup call-back fires and moves its replicas
>   (and partitions) to online state. 503 starts its replica fetchers.
> - Unfortunately, the (phantom) shutdown requests are still being handled and
>   the controller sends StopReplica requests to 503.
> - The first shutdown request finally finishes (after 76 minutes in my case!).
> - The remaining shutdown requests also execute and do the same thing (sends
>   StopReplica requests for all partitions to
>   503).
> - The remaining requests complete quickly because they end up not having to
>   touch zookeeper paths - no leaders left on the broker and no need to
>   shrink ISR in zookeeper since it has already been done by the first
>   shutdown request.
> - So in the end-state 503 is up, but effectively idle due to the previous
>   PID's shutdown requests.
> There are some obvious fixes that can be made to controlled shutdown to help
> address the above issue. E.g., we don't really need to move follower
> partitions to Offline. We did that as an "optimization" so the broker falls
> out of ISR sooner - which is helpful when producers set required.acks to -1.
> However it adds a lot of latency to controlled shutdown. Also, (more
> importantly) we should have a mechanism to abort any stale shutdown process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1907: MINOR: Wakeups propagated from commitOffsets in Wo...

2016-09-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1907


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4214) kafka-reassign-partitions fails all the time when brokers are bounced during reassignment

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524218#comment-15524218
 ] 

ASF GitHub Bot commented on KAFKA-4214:
---

GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/1910

KAFKA-4214:kafka-reassign-partitions fails all the time when brokers are 
bounced during reassignment

There is a corner case bug, where during partition reassignment, if the
controller and a broker receiving a new replica are bounced at the same
time, the partition reassignment is failed.

The cause of this bug is a block of code in the KafkaController which
fails the reassignment if the aliveNewReplicas != newReplicas, ie. if
some of the new replicas are offline at the time a controller fails
over.

The fix is to have the controller listen for ISR change events even for
new replicas which are not alive when the controller boots up. Once the
said replicas come online, they will be in the ISR set, and the new
controller will detect this, and then mark the reassignment as
successful.

Interestingly, the block of code in question was introduced in
KAFKA-990, where a concern about this exact scenario was raised :)

This bug was revealed in the system tests in 
https://github.com/apache/kafka/pull/1904. 
The relevant tests will be enabled in either this or a followup PR when 
PR-1904 is merged.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka KAFKA-4214

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1910


commit 7f9814466592f549e8fcde914815c35dcb5a046d
Author: Apurva Mehta 
Date:   2016-09-26T21:11:44Z

There is a corner case bug, where during partition reassignment, if the
controller and a broker receiving a new replica are bounced at the same
time, the partition reassignment is failed.

The cause of this bug is a block of code in the KafkaController which
fails the reassignment if the aliveNewReplicas != newReplicas, ie. if
some of the new replicas are offline at the time a controller fails
over.

The fix is to have the controller listen for ISR change events even for
new replicas which are not alive when the controller boots up. Once the
said replicas come online, they will be in the ISR set, and the new
controller will detect this, and then mark the reassignment as
successful.

Interestingly, the block of code in question was introduced in
KAFKA-990, where a concern about this exact scenario was raised :)




> kafka-reassign-partitions fails all the time when brokers are bounced during 
> reassignment
> -
>
> Key: KAFKA-4214
> URL: https://issues.apache.org/jira/browse/KAFKA-4214
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>
> Due to KAFKA-4204, we never realized that the existing system test for 
> testing reassignment would always fail when brokers were bounced in mid 
> process. This happens reliably, even for topics of varying number of 
> partition and varying replication factors.
> In particular, we see errors like this in the logs when the brokers are 
> bounced: 
> {noformat}
> Status of partition reassignment:
> ERROR: Assigned replicas (1,2) don't match the list of replicas for 
> reassignment (1) for partition [test_topic,2]
> Reassignment of partition [test_topic,1] completed successfully
> Reassignment of partition [test_topic,2] failed
> Reassignment of partition [test_topic,3] completed successfully
> Reassignment of partition [test_topic,0] is still in progress
> {noformat}
> Currently, the tests which bounce brokers during reassignment are disabled 
> until this bug is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1910: KAFKA-4214:kafka-reassign-partitions fails all the...

2016-09-26 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/1910

KAFKA-4214:kafka-reassign-partitions fails all the time when brokers are 
bounced during reassignment

There is a corner case bug, where during partition reassignment, if the
controller and a broker receiving a new replica are bounced at the same
time, the partition reassignment is failed.

The cause of this bug is a block of code in the KafkaController which
fails the reassignment if the aliveNewReplicas != newReplicas, ie. if
some of the new replicas are offline at the time a controller fails
over.

The fix is to have the controller listen for ISR change events even for
new replicas which are not alive when the controller boots up. Once the
said replicas come online, they will be in the ISR set, and the new
controller will detect this, and then mark the reassignment as
successful.

Interestingly, the block of code in question was introduced in
KAFKA-990, where a concern about this exact scenario was raised :)

This bug was revealed in the system tests in 
https://github.com/apache/kafka/pull/1904. 
The relevant tests will be enabled in either this or a followup PR when 
PR-1904 is merged.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka KAFKA-4214

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1910


commit 7f9814466592f549e8fcde914815c35dcb5a046d
Author: Apurva Mehta 
Date:   2016-09-26T21:11:44Z

There is a corner case bug, where during partition reassignment, if the
controller and a broker receiving a new replica are bounced at the same
time, the partition reassignment is failed.

The cause of this bug is a block of code in the KafkaController which
fails the reassignment if the aliveNewReplicas != newReplicas, ie. if
some of the new replicas are offline at the time a controller fails
over.

The fix is to have the controller listen for ISR change events even for
new replicas which are not alive when the controller boots up. Once the
said replicas come online, they will be in the ISR set, and the new
controller will detect this, and then mark the reassignment as
successful.

Interestingly, the block of code in question was introduced in
KAFKA-990, where a concern about this exact scenario was raised :)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3999) Consumer bytes-fetched metric uses decompressed message size

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524194#comment-15524194
 ] 

ASF GitHub Bot commented on KAFKA-3999:
---

GitHub user vahidhashemian reopened a pull request:

https://github.com/apache/kafka/pull/1698

KAFKA-3999: Record raw size of fetch responses as part of consumer metrics

Currently, only the decompressed size of fetch responses is recorded. This 
PR adds a sensor to keep track of the raw size as well.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-3999

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1698


commit a2b4425071c52a0428157a2bdf8c1e20c9af66a0
Author: Vahid Hashemian 
Date:   2016-08-02T23:07:27Z

KAFKA-3999: Record raw size of fetch responses

Currently, only the decompressed size of fetch responses is recorded. This 
PR adds a sensor to keep track of their raw size as well.




> Consumer bytes-fetched metric uses decompressed message size
> 
>
> Key: KAFKA-3999
> URL: https://issues.apache.org/jira/browse/KAFKA-3999
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>Priority: Minor
>
> It looks like the computation for the bytes-fetched metrics uses the size of 
> the decompressed message set. I would have expected it to be based off of the 
> raw size of the fetch responses. Perhaps it would be helpful to expose both 
> the raw and decompressed fetch sizes? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3999) Consumer bytes-fetched metric uses decompressed message size

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524193#comment-15524193
 ] 

ASF GitHub Bot commented on KAFKA-3999:
---

Github user vahidhashemian closed the pull request at:

https://github.com/apache/kafka/pull/1698


> Consumer bytes-fetched metric uses decompressed message size
> 
>
> Key: KAFKA-3999
> URL: https://issues.apache.org/jira/browse/KAFKA-3999
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>Priority: Minor
>
> It looks like the computation for the bytes-fetched metrics uses the size of 
> the decompressed message set. I would have expected it to be based off of the 
> raw size of the fetch responses. Perhaps it would be helpful to expose both 
> the raw and decompressed fetch sizes? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1698: KAFKA-3999: Record raw size of fetch responses as ...

2016-09-26 Thread vahidhashemian
GitHub user vahidhashemian reopened a pull request:

https://github.com/apache/kafka/pull/1698

KAFKA-3999: Record raw size of fetch responses as part of consumer metrics

Currently, only the decompressed size of fetch responses is recorded. This 
PR adds a sensor to keep track of the raw size as well.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-3999

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1698


commit a2b4425071c52a0428157a2bdf8c1e20c9af66a0
Author: Vahid Hashemian 
Date:   2016-08-02T23:07:27Z

KAFKA-3999: Record raw size of fetch responses

Currently, only the decompressed size of fetch responses is recorded. This 
PR adds a sensor to keep track of their raw size as well.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1698: KAFKA-3999: Record raw size of fetch responses as ...

2016-09-26 Thread vahidhashemian
Github user vahidhashemian closed the pull request at:

https://github.com/apache/kafka/pull/1698


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3708) Rethink exception handling in KafkaStreams

2016-09-26 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524190#comment-15524190
 ] 

Guozhang Wang commented on KAFKA-3708:
--

PR submitted from [~damianguy]: https://github.com/apache/kafka/pull/1819

> Rethink exception handling in KafkaStreams
> --
>
> Key: KAFKA-3708
> URL: https://issues.apache.org/jira/browse/KAFKA-3708
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> As for 0.10.0.0, the worker threads (i.e. {{StreamThreads}}) can possibly 
> encounter the following runtime exceptions:
> 1) {{consumer.poll()}} could throw KafkaException if some of the 
> configuration are not accepted, such as topics not authorized to read / write 
> (security), session-timeout value not valid, etc; these exceptions will be 
> thrown in the first ever {{poll()}}.
> 2) {{task.addRecords()}} could throw KafkaException (most likely 
> SerializationException) if the deserialization fails.
> 3) {{task.process() / punctuate()}} could throw various KafkaException; for 
> example, serialization / deserialization errors, state storage operation 
> failures (RocksDBException, for example),  producer sending failures, etc.
> 4) {{maybeCommit / commitAll / commitOne}} could throw various Exceptions if 
> the flushing of state store fails, and when {{consumer.commitSync}} throws 
> exceptions other than {{CommitFailedException}}.
> For all the above 4 cases, KafkaStreams does not capture and handle them, but 
> expose them to users, and let users to handle them via 
> {{KafkaStreams.setUncaughtExceptionHandler}}. We need to re-think if the 
> library should just handle these cases without exposing them to users and 
> kill the threads / migrate tasks to others since they are all not recoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4212) Add a key-value store that is a TTL persistent cache

2016-09-26 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524186#comment-15524186
 ] 

Guozhang Wang commented on KAFKA-4212:
--

Hello [~elevy], thanks for reporting this feature request.

Could you elaborate a bit more about your use case, and how a persistent store 
with TTL mechanism can help with your use case, so that we could understand 
better about the motivation and commonality of this feature.

> Add a key-value store that is a TTL persistent cache
> 
>
> Key: KAFKA-4212
> URL: https://issues.apache.org/jira/browse/KAFKA-4212
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Elias Levy
>Assignee: Guozhang Wang
>
> Some jobs needs to maintain as state a large set of key-values for some 
> period of time.  I.e. they need to maintain a TTL cache of values potentially 
> larger than memory. 
> Currently Kafka Streams provides non-windowed and windowed key-value stores.  
> Neither is an exact fit to this use case.  
> The {{RocksDBStore}}, a {{KeyValueStore}}, stores one value per key as 
> required, but does not support expiration.  The TTL option of RocksDB is 
> explicitly not used.
> The {{RocksDBWindowsStore}}, a {{WindowsStore}}, can expire items via segment 
> dropping, but it stores multiple items per key, based on their timestamp.  
> But this store can be repurposed as a cache by fetching the items in reverse 
> chronological order and returning the first item found.
> KAFKA-2594 introduced a fixed-capacity in-memory LRU caching store, but here 
> we desire a variable-capacity memory-overflowing TTL caching store.
> Although {{RocksDBWindowsStore}} can be repurposed as a cache, it would be 
> useful to have an official and proper TTL cache API and implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3174) Re-evaluate the CRC32 class performance.

2016-09-26 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-3174:

Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Re-evaluate the CRC32 class performance.
> 
>
> Key: KAFKA-3174
> URL: https://issues.apache.org/jira/browse/KAFKA-3174
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.2.0
>
>
> We used org.apache.kafka.common.utils.CRC32 in clients because it has better 
> performance than java.util.zip.CRC32 in Java 1.6.
> In a recent test I ran it looks in Java 1.8 the CRC32 class is 2x as fast as 
> the Crc32 class we are using now. We may want to re-evaluate the performance 
> of Crc32 class and see it makes sense to simply use java CRC32 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3331) Refactor TopicCommand to make it testable and add unit tests

2016-09-26 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524171#comment-15524171
 ] 

Jiangjie Qin commented on KAFKA-3331:
-

It seem the patch has been there for quite some time. [~gwenshap] Are we going 
to switch to the admin client instead?

> Refactor TopicCommand to make it testable and add unit tests
> 
>
> Key: KAFKA-3331
> URL: https://issues.apache.org/jira/browse/KAFKA-3331
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 0.9.0.1
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
> Fix For: 0.10.1.0
>
>
> TopicCommand has become a functionality packed, hard to read, class. Adding 
> or changing it with confidence requires some unit tests around it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4120) byte[] keys in RocksDB state stores do not work as expected

2016-09-26 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524168#comment-15524168
 ] 

Guozhang Wang commented on KAFKA-4120:
--

[~elevy] {{o.a.k.common.utils.Bytes}} is added in trunk after the 0.10.0.1 
release and hence the corresponding Javadoc will only be available after the 
0.10.1.0 release.

[~gfodor] As part of the KIP-63, we have actually pursued the second option to 
make all the LRU caches storing bytes, this arguably will be less efficient due 
to ser / deser, but we believe it is beneficial for memory management in Kafka 
Streams in the long run. More discussions can be found here:

https://cwiki.apache.org/confluence/display/KAFKA/Discussion%3A+Memory+Management+in+Kafka+Streams

> byte[] keys in RocksDB state stores do not work as expected
> ---
>
> Key: KAFKA-4120
> URL: https://issues.apache.org/jira/browse/KAFKA-4120
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Greg Fodor
>Assignee: Guozhang Wang
>
> We ran into an issue using a byte[] key in a RocksDB state store (with the 
> byte array serde.) Internally, the RocksDB store keeps a LRUCache that is 
> backed by a LinkedHashMap that sits between the callers and the actual db. 
> The problem is that while the underlying rocks db will persist byte arrays 
> with equal data as equivalent keys, the LinkedHashMap uses byte[] reference 
> equality from Object.equals/hashcode. So, this can result in multiple entries 
> in the cache for two different byte arrays that have the same contents and 
> are backed by the same key in the db, resulting in unexpected behavior. 
> One such behavior that manifests from this is if you store a value in the 
> state store with a specific key, if you re-read that key with the same byte 
> array you will get the new value, but if you re-read that key with a 
> different byte array with the same bytes, you will get a stale value until 
> the db is flushed. (This made it particularly tricky to track down what was 
> happening :))
> The workaround for us is to convert the keys from raw byte arrays to a 
> deserialized avro structure that provides proper hashcode/equals semantics 
> for the intermediate cache. In general this seems like good practice, so one 
> of the proposed solutions is to simply emit a warning or exception if a key 
> type with breaking semantics like this is provided.
> A few proposed solutions:
> - When the state store is defined on array keys, ensure that the cache map 
> does proper comparisons on array values not array references. This would fix 
> this problem, but seems a bit strange to special case. However, I have a hard 
> time of thinking of other examples where this behavior would burn users.
> - Change the LRU cache to deserialize and serialize all keys to bytes and use 
> a value based comparison for the map. This would be the most correct, as it 
> would ensure that both the rocks db and the cache have identical key spaces 
> and equality/hashing semantics. However, this is probably slow, and since the 
> general case of using avro record types as keys works fine, it will largely 
> be unnecessary overhead.
> - Don't change anything about the behavior, but trigger a warning in the log 
> or fail to start if a state store is defined on array keys (or possibly any 
> key type that fails to properly override Object.equals/hashcode.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1909: MINOR: Allow for asynchronous start of producer co...

2016-09-26 Thread kkonstantine
GitHub user kkonstantine opened a pull request:

https://github.com/apache/kafka/pull/1909

MINOR: Allow for asynchronous start of producer consumer in validatio…

…n test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkonstantine/kafka 
MINOR-Async-start-in-produce-consume-validate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1909.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1909


commit d7a77ffb27426e3d369485a6975c0e0bddf03153
Author: Konstantine Karantasis 
Date:   2016-09-25T21:12:12Z

MINOR: Allow for asynchronous start of producer consumer in validation test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


SendFailedException

2016-09-26 Thread Ghosh, Achintya (Contractor)
Hi there,

Can anyone please help us as we are getting the SendFailedException when Kafka 
consumer is starting and not able to consume any message?

Thanks
Achintya


[jira] [Commented] (KAFKA-3697) Clean-up website documentation when it comes to clients

2016-09-26 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523884#comment-15523884
 ] 

Vahid Hashemian commented on KAFKA-3697:


[~ijuma] Starting to look at this and wondering if we should also update the 
doc examples to use the Java consumer (part of this JIRA or otherwise).

> Clean-up website documentation when it comes to clients
> ---
>
> Key: KAFKA-3697
> URL: https://issues.apache.org/jira/browse/KAFKA-3697
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Vahid Hashemian
> Fix For: 0.10.1.0
>
>
> We should only mention the Java consumer and producer as they are the 
> recommended clients now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-4220) Clean up & provide better error message when incorrect argument types are provided in the command line client

2016-09-26 Thread Ishita Mandhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4220 started by Ishita Mandhan.
-
> Clean up & provide better error message when incorrect argument types are 
> provided in the command line client
> -
>
> Key: KAFKA-4220
> URL: https://issues.apache.org/jira/browse/KAFKA-4220
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ishita Mandhan
>Assignee: Ishita Mandhan
>Priority: Minor
>
> When the argument provided to a command line statement is not of the right 
> type, a stack trace is returned. This can be replaced by a cleaner error 
> message that is earlier to read & understand for the user.
> For example-
> bin/kafka-console-consumer.sh --new-consumer --bootstrap-server 
> localhost:9092 --topic foo --timeout-ms abc
> 'abc' is an incorrect type for the --timeout-ms parameter, which expects a 
> number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1908: Kafka 3396

2016-09-26 Thread edoardocomar
GitHub user edoardocomar opened a pull request:

https://github.com/apache/kafka/pull/1908

Kafka 3396

Reopening of https://github.com/apache/kafka/pull/1428

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/edoardocomar/kafka KAFKA-3396

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1908.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1908


commit 371079983a92a240efda423cbf1a446eba7fa65f
Author: Edoardo Comar 
Date:   2016-05-26T13:18:12Z

KAFKA-3396 : Unauthorized topics are returned to the user

Modified KafkaApis to return Errors.UNKNOWN_TOPIC_OR_PARTITION if
principal has no Describe access to topic

Unit tests expanded

Some paths cause the client to block due to bug
https://issues.apache.org/jira/browse/KAFKA-3727?filter=-2
tests work around this by executing in separate thread

commit b6510b0f2059a92dfb4f6c427df50ac4e7339d8e
Author: Mickael Maison 
Date:   2016-07-26T18:20:06Z

KAFKA-3396: Updates and cleanups following the feedback

commit 38708ad2d1990304f75c2981350a9d211ecd173a
Author: Mickael Maison 
Date:   2016-08-17T12:34:19Z

KAFKA-3396: More updates + rebased off trunk

- Added tests from 44ad3ec
- Small refactorings

commit 6d33f133adf68ee43cc05fa99645f78e5bf75e10
Author: Edoardo Comar 
Date:   2016-09-21T09:34:02Z

KAFKA-3396 : Unauthorized topics are returned to the user

Rebased after kip-79 changes.
Fixing leak of topic for LIST_OFFSETS when unauthorized.
Added tests.

commit 9d80a9af83bf89a0884055d80c37aa170ffd633e
Author: Edoardo Comar 
Date:   2016-09-23T15:34:48Z

KAFKA-3396 : Unauthorized topics are returned to the user

cleanup after review

commit 226d3d0b85dac22dbd5062d44c50692b64161a7e
Author: Edoardo Comar 
Date:   2016-09-26T10:58:07Z

KAFKA-3396 : Unauthorized topics are returned to the user

Cleanup addressing comments

commit 01db26cbe832ddc040977b6f4f41adbc532a2657
Author: Edoardo Comar 
Date:   2016-09-26T11:14:38Z

KAFKA-3396 : Unauthorized topics are returned to the user

Cleanup addressing comments

commit 5ea87a379b5bfb473c3476dd0a0da072f11becd9
Author: Edoardo Comar 
Date:   2016-09-26T18:10:53Z

KAFKA-3396 : Unauthorized topics are returned to the user

Revised handling of FETCH PRODUCE and DELETE requests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: it this a bug? - message disorder in async send mode -- 0.9.0 java client sdk InFlightRequests

2016-09-26 Thread Jason Gustafson
Hi there,

The Kafka server implements head of line request blocking, which means that
it will only handle one request a time from a given socket. That means that
the responses will always be returned in the same order as the requests
were sent.

-Jason

On Sat, Sep 24, 2016 at 1:19 AM, 一生有你  wrote:

> We know that in the async send mode, kafka do not guarantee the message
> order even for  the same partition.
>
>
> That is, if we send 3 request  ( the same topic, the same partition)  to a
> kafka server in the async mode,
> the send order is 1, 2, 3 (correlation id is 1, 2, 3),  while the kafka
> server maybe save the 3 request in the log by the order 3, 2, 1,  and
> return to the client by the order 2, 3, 1。
>
>
> This happens because Kafka server processes requests with multi
> threads(multi KafkaRequestHandler).
>
>
> If the above is true,  below in the 0.9.0 java client idk maybe has
> problem:
>
>
> In the class NetworkClient,  there is a collection inFlightRequests to
> maintain all the in flight request:
>
>
> private final InFlightRequests inFlightRequests;
> final class InFlightRequests {
>
> private final int maxInFlightRequestsPerConnection;
> private final Map requests = new
> HashMap();...}
> It use a Deque to maintain the in flight requests whose response has not
> come back.
> Whenever we send a request, we will enqueue the request,  and when the
> response come back, we will dequeue the request.
> private void doSend(ClientRequest request, long now) {
> request.setSendTimeMs(now);
> this.inFlightRequests.add(request);
> selector.send(request.request());
> }private void handleCompletedReceives(List responses,
> long now) {
> for (NetworkReceive receive : this.selector.completedReceives()) {
> String source = receive.source();
> ClientRequest req = inFlightRequests.completeNext(source);
> ResponseHeader header = ResponseHeader.parse(receive.payload());
> // Always expect the response version id to be the same as the
> request version id
> short apiKey = req.request().header().apiKey();
> short apiVer = req.request().header().apiVersion();
> Struct body = (Struct) ProtoUtils.responseSchema(apiKey,
> apiVer).read(receive.payload());
> correlate(req.request().header(), header);
> if (!metadataUpdater.maybeHandleCompletedReceive(req, now, body))
> responses.add(new ClientResponse(req, now, false, body));
> }
> }
> but if the request order and the response order does not match,  is it the
> Deque suitable?  or it should be use a Map to maintain the request?
> By the way, in the above,  there is a function correlate(xxx) to check the
> match, if not match,  it will throw a exception.private void
> correlate(RequestHeader requestHeader, ResponseHeader responseHeader) {
> if (requestHeader.correlationId() != responseHeader.correlationId())
> throw new IllegalStateException("Correlation id for response (" +
> responseHeader.correlationId()
> + ") does not match request (" +
> requestHeader.correlationId() + ")");
> }
> But in the async mode,  as mentioned above,  the mismatch is normal, and
> likely happen.
> So here is it enough to process the problem by just throwing an exception ?


[GitHub] kafka pull request #1907: MINOR: Wakeups propagated from commitOffsets in Wo...

2016-09-26 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1907

MINOR: Wakeups propagated from commitOffsets in WorkerSinkTask should be 
caught



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka catch-wakeup-worker-sink-task

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1907


commit 88fd33cf7f3d843ff8c7b0b371b41a5ac6975634
Author: Jason Gustafson 
Date:   2016-09-26T17:34:45Z

MINOR: Wakeups propagated from commitOffsets in WorkerSinkTask should be 
caught




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4216) Replication Quotas: Control Leader & Follower Throttled Replicas Separately

2016-09-26 Thread Ben Stopford (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Stopford updated KAFKA-4216:

Status: Patch Available  (was: In Progress)

> Replication Quotas: Control Leader & Follower Throttled Replicas Separately
> ---
>
> Key: KAFKA-4216
> URL: https://issues.apache.org/jira/browse/KAFKA-4216
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Ben Stopford
>Assignee: Ben Stopford
>
> The current algorithm in kafka-reassign-partitions applies a throttle to all 
> moving replicas, be they leader-side or follower-side. 
> A preferable solution would be to change the throttled replica list to 
> specify whether the throttle applies to leader or follower. That way we can 
> ensure that the regular replication will not be throttled.  
> To do this we should change the way the throttled replica list is specified 
> so it is spread over two separate properties. One that corresponds to the 
> leader-side throttle, and the other that corresponds to the follower-side 
> throttle.
> quota.leader.replication.throttled.replicas = 
> [partId]:[replica],[partId]:[replica],[partId]:[replica] 
> quota.follower.replication.throttled.replicas = 
> [partId]:[replica],[partId]:[replica],[partId]:[replica] 
> Then, when applying the throttle, the leader quota can be applied to all 
> current replicas, and the follower quota can be applied only to the new 
> replicas we are creating as part of the rebalance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-4216) Replication Quotas: Control Leader & Follower Throttled Replicas Separately

2016-09-26 Thread Ben Stopford (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4216 started by Ben Stopford.
---
> Replication Quotas: Control Leader & Follower Throttled Replicas Separately
> ---
>
> Key: KAFKA-4216
> URL: https://issues.apache.org/jira/browse/KAFKA-4216
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Ben Stopford
>Assignee: Ben Stopford
>
> The current algorithm in kafka-reassign-partitions applies a throttle to all 
> moving replicas, be they leader-side or follower-side. 
> A preferable solution would be to change the throttled replica list to 
> specify whether the throttle applies to leader or follower. That way we can 
> ensure that the regular replication will not be throttled.  
> To do this we should change the way the throttled replica list is specified 
> so it is spread over two separate properties. One that corresponds to the 
> leader-side throttle, and the other that corresponds to the follower-side 
> throttle.
> quota.leader.replication.throttled.replicas = 
> [partId]:[replica],[partId]:[replica],[partId]:[replica] 
> quota.follower.replication.throttled.replicas = 
> [partId]:[replica],[partId]:[replica],[partId]:[replica] 
> Then, when applying the throttle, the leader quota can be applied to all 
> current replicas, and the follower quota can be applied only to the new 
> replicas we are creating as part of the rebalance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4221) KafkaConsumer Fetcher can send ListOffsets requests in parallel when initializing position

2016-09-26 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-4221:
--

 Summary: KafkaConsumer Fetcher can send ListOffsets requests in 
parallel when initializing position
 Key: KAFKA-4221
 URL: https://issues.apache.org/jira/browse/KAFKA-4221
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Reporter: Jason Gustafson
 Fix For: 0.10.2.0


With the changes in KIP-79, we've added basic support for sending ListOffsets 
requests in parallel, however we do not yet utilize this capability when 
resetting offsets internally. Instead we send a separate ListOffsets request 
for each partition separately. Should be straightforward to refactor Fetcher to 
make use of parallel requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-73: Replication Quotas

2016-09-26 Thread Jun Rao
Yes, this change makes sense to me since it gives the admin better control.

Thanks,

Jun

On Sun, Sep 25, 2016 at 7:35 AM, Ben Stopford  wrote:

> Hi All
>
> We’ve made an adjustment to KIP-73: Replication Quotas which I’d like to
> open up to the community for approval.
>
> Previously the admin passed a list of replicas to be throttled:
>
> quota.replication.throttled.replicas =
> [partId]:[replica],[partId]:[replica],[partId]:[replica] etc
>
> The change is to split this into two properties. One that corresponds to
> the leader-side throttle, and the other that corresponds to the
> follower-side throttle.
>
> quota.leader.replication.throttled.replicas =
> [partId]:[replica],[partId]:[replica],[partId]:[replica]
> quota.follower.replication.throttled.replicas =
> [partId]:[replica],[partId]:[replica],[partId]:[replica]
>
> This provides more control over the throttling process. It also helps us
> with a common use case which I’ve described below, for those interested.
>
> Please reply if you have any comments / issues / suggestions.
>
> Thanks as ever.
>
> Ben
>
>
> Sample Use Case:
>
> Say we are moving partition 0. It has replicas [104,107,109] moving to
> [105,107,113]
>
> So the leader could be any of [104,107,109] and we know we will be
> creating new replicas on 105 & 113.
>
> In the current mechanism, all we can do is apply both leader and follower
> throttles to all 5:  [104,107,109,105,113] which will mean the regular
> replication traffic between (say 107 is leader) 107->104 and 107->109 will
> be throttled also. This is potentially problematic.
>
> What we really want to do is apply:
>
> LeaderThrottle [104,107,109]
> FollowerThrottle [105,113]
>
> This way, during a rebalance, we that standard replication traffic will
> not be  throttled, but the rebalance will perform correctly if leaders
> move. One subtlety is that, should the leader move to the “move
> destination” node, it would no longer be throttled. But this is actually to
> our benefit in the normal case.
>
>


[GitHub] kafka pull request #1906: KAFKA 4216: Control Leader & Follower Throttled Re...

2016-09-26 Thread benstopford
GitHub user benstopford opened a pull request:

https://github.com/apache/kafka/pull/1906

KAFKA 4216: Control Leader & Follower Throttled Replicas Separately

Splits the throttled replica configuration (the list of which replicas 
should be throttled for each topic) into two. One for the leader throttle, one 
for the follower throttle.

So:
 quota.replication.throttled.replicas
=> 
quota.leader.replication.throttled.replicas & 
quota.follower.replication.throttled.replicas

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/benstopford/kafka 
KAFKA-4216-seperate-leader-and-follower-throttled-replica-lists

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1906.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1906


commit 03a786d4cf1bf73e7c096c226d8e3984e73d5c6f
Author: Ben Stopford 
Date:   2016-09-26T14:26:11Z

KAFKA-4216: First cut of split of leader/follwer replica lists.

commit 1570292da52a656da608669be57b7759a248af87
Author: Ben Stopford 
Date:   2016-09-26T15:34:41Z

KAFKA-4216: Refactored the way we calculate which replicas should be 
throttled. No functional change in this commit.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3102) Kafka server unable to connect to zookeeper

2016-09-26 Thread SOFT COMPANY - NEJJAR, Azeddine (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523330#comment-15523330
 ] 

SOFT COMPANY - NEJJAR, Azeddine commented on KAFKA-3102:


Hello,

Have you find any solution for Kafka connection error.
I would like to configure Kafka to connect  on sasl zookeeper.

There in my error message :

Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23 1 3.
[Krb5LoginModule] authentication failed
Checksum failed
[2016-09-26 16:30:13,159] WARN SASL configuration failed: 
javax.security.auth.login.LoginException: Checksum failed Will continue 
connection to Zookeeper server without SASL authentication, if Zookeeper server 
allows it. (org.apache.zookeeper.ClientCnxn)

Thanks a lot for your help,
Sincerely,

Azeddine




L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company 
that sent this message cannot therefore be held liable for its content nor 
attachments. Any unauthorized use or dissemination is prohibited. If you are 
not the intended recipient of this message, then please delete it and notify 
the sender.


> Kafka server unable to connect to zookeeper
> ---
>
> Key: KAFKA-3102
> URL: https://issues.apache.org/jira/browse/KAFKA-3102
> Project: Kafka
>  Issue Type: Bug
>  Components: security
> Environment: RHEL 6
>Reporter: Mohit Anchlia
>
> Server disconnects from the zookeeper with the following log, and logs are 
> not indicative of any problem. It works without the security setup however. 
> I followed the security configuration steps from this site: 
> http://docs.confluent.io/2.0.0/kafka/sasl.html
> In here find the list of principals, logs and Jaas file:
> 1) Jaas file 
> KafkaServer {
> com.sun.security.auth.module.Krb5LoginModule required
> useKeyTab=true
> storeKey=true
> keyTab="/mnt/kafka/kafka/kafka.keytab"
> principal="kafka/10.24.251@example.com";
> };
> Client {
> com.sun.security.auth.module.Krb5LoginModule required
> useKeyTab=true
> storeKey=true
> keyTab="/mnt/kafka/kafka/kafka.keytab"
> principal="kafka/10.24.251@example.com";
> };
> 2) Principles from krb admin
> kadmin.local:  list_principals
> K/m...@example.com
> kadmin/ad...@example.com
> kadmin/chang...@example.com
> kadmin/ip-10-24-251-175.us-west-2.compute.inter...@example.com
> kafka/10.24.251@example.com
> krbtgt/example@example.com
> [2016-01-13 16:26:00,551] INFO starting (kafka.server.KafkaServer)
> [2016-01-13 16:26:00,557] INFO Connecting to zookeeper on localhost:2181 
> (kafka.server.KafkaServer)
> [2016-01-13 16:27:30,718] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to 
> zookeeper server within timeout: 6000
> at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1223)
> at org.I0Itec.zkclient.ZkClient.(ZkClient.java:155)
> at org.I0Itec.zkclient.ZkClient.(ZkClient.java:129)
> at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:89)
> at kafka.utils.ZkUtils$.apply(ZkUtils.scala:71)
> at kafka.server.KafkaServer.initZk(KafkaServer.scala:278)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:168)
> at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
> at kafka.Kafka$.main(Kafka.scala:67)
> at kafka.Kafka.main(Kafka.scala)
> [2016-01-13 16:27:30,721] INFO shutting down (kafka.server.KafkaServer)
> [2016-01-13 16:27:30,727] INFO shut down completed (kafka.server.KafkaServer)
> [2016-01-13 16:27:30,728] FATAL Fatal error during KafkaServerStartable 
> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
> org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to 
> zookeeper server within timeout: 6000
> at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1223)
> at org.I0Itec.zkclient.ZkClient.(ZkClient.java:155)
> at org.I0Itec.zkclient.ZkClient.(ZkClient.java:129)
> at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:89)
> at kafka.utils.ZkUtils$.apply(ZkUtils.scala:71)
> at kafka.server.KafkaServer.initZk(KafkaServer.scala:278)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:168)
> at 
> 

Re: Permission to edit KIP pages

2016-09-26 Thread Khurrum Nasim
Hello Ismael,

My wiki user name is khurrumnasimm.

Thanks,
KN

On Sun, Sep 18, 2016 at 3:52 PM, Ismael Juma  wrote:

> Hi KN,
>
> What's your wiki user name?
>
> Ismael
>
> On Sun, Sep 18, 2016 at 3:02 AM, Khurrum Nasim 
> wrote:
>
> > Hi,
> >
> > I wanted to edit a KIP page and would like to get the permission for
> that.
> > Currently I don't have edit authorization. It does not show me an option
> to
> > edit.
> > Can one of the committers grant me permission?
> >
> > Thanks,
> > KN
> >
>


[jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2016-09-26 Thread Andy Coates (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522746#comment-15522746
 ] 

Andy Coates commented on KAFKA-3919:


Thanks [~ijuma]. Given that the problems I'm experiencing are purely related to 
messages produced with acks=1 and KAFKA-2111 speaks explicitly about acks>1, 
I'd be interested if [~junrao] thinks that 2111 does indeed fix this issue too, 
given this new information on the problem.

Thanks all!

> Broker faills to start after ungraceful shutdown due to non-monotonically 
> incrementing offsets in logs
> --
>
> Key: KAFKA-3919
> URL: https://issues.apache.org/jira/browse/KAFKA-3919
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Andy Coates
>
> Hi All,
> I encountered an issue with Kafka following a power outage that saw a 
> proportion of our cluster disappear. When the power came back on several 
> brokers halted on start up with the error:
> {noformat}
>   Fatal error during KafkaServerStartable startup. Prepare to shutdown”
>   kafka.common.InvalidOffsetException: Attempt to append an offset 
> (1239742691) to position 35728 no larger than the last offset appended 
> (1239742822) to 
> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
>   at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
>   at kafka.log.LogSegment.recover(LogSegment.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at kafka.log.Log.loadSegments(Log.scala:160)
>   at kafka.log.Log.(Log.scala:90)
>   at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The only way to recover the brokers was to delete the log files that 
> contained non monotonically incrementing offsets.
> We've spent some time digging through the logs and I feel I may have worked 
> out the sequence of events leading to this issue, (though this is based on 
> some assumptions I've made about the way Kafka is working, which may be 
> wrong).
> First off, we have unclean leadership elections disable. (We did later enable 
> them to help get around some other issues we were having, but this was 
> several hours after this issue manifested), and we're producing to the topic 
> with gzip compression and acks=1
> We looked through the data logs that were causing the brokers to not start. 
> What we found the initial part of the log has monotonically increasing 
> offset, where each compressed batch normally contained one or two records. 
> Then the is a batch that contains many records, whose first records have an 
> offset below the previous batch and whose last record has an offset above the 
> previous batch. Following on from this there continues a period of large 
> batches, with monotonically increasing offsets, and then the log returns to 
> batches with one or two records.
> Our working assumption here is that the period before the offset dip, with 
> the small batches, is pre-outage normal operation. The period of larger 
> batches is from just after the outage, where producers have a back log to 
> processes when the partition becomes available, and then things return to 
> normal batch sizes again once the back log clears.
> We did also look through the Kafka's application logs to try and piece 
> together the series of events leading up to this. Here’s what we know 
> happened, with regards to one partition that has issues, from the logs:
> Prior 

[jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2016-09-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522629#comment-15522629
 ] 

Ismael Juma commented on KAFKA-3919:


Thanks [~BigAndy]. It's worth noting that the discussion moved to KAFKA-1211 
after Jun's last comment above. I'll leave it to Jun to comment on whether the 
KAFKA-1211 fix being discussed is also relevant for the most recent incident 
that did not involve an ungraceful shutdown.

> Broker faills to start after ungraceful shutdown due to non-monotonically 
> incrementing offsets in logs
> --
>
> Key: KAFKA-3919
> URL: https://issues.apache.org/jira/browse/KAFKA-3919
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Andy Coates
>
> Hi All,
> I encountered an issue with Kafka following a power outage that saw a 
> proportion of our cluster disappear. When the power came back on several 
> brokers halted on start up with the error:
> {noformat}
>   Fatal error during KafkaServerStartable startup. Prepare to shutdown”
>   kafka.common.InvalidOffsetException: Attempt to append an offset 
> (1239742691) to position 35728 no larger than the last offset appended 
> (1239742822) to 
> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
>   at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
>   at kafka.log.LogSegment.recover(LogSegment.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at kafka.log.Log.loadSegments(Log.scala:160)
>   at kafka.log.Log.(Log.scala:90)
>   at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The only way to recover the brokers was to delete the log files that 
> contained non monotonically incrementing offsets.
> We've spent some time digging through the logs and I feel I may have worked 
> out the sequence of events leading to this issue, (though this is based on 
> some assumptions I've made about the way Kafka is working, which may be 
> wrong).
> First off, we have unclean leadership elections disable. (We did later enable 
> them to help get around some other issues we were having, but this was 
> several hours after this issue manifested), and we're producing to the topic 
> with gzip compression and acks=1
> We looked through the data logs that were causing the brokers to not start. 
> What we found the initial part of the log has monotonically increasing 
> offset, where each compressed batch normally contained one or two records. 
> Then the is a batch that contains many records, whose first records have an 
> offset below the previous batch and whose last record has an offset above the 
> previous batch. Following on from this there continues a period of large 
> batches, with monotonically increasing offsets, and then the log returns to 
> batches with one or two records.
> Our working assumption here is that the period before the offset dip, with 
> the small batches, is pre-outage normal operation. The period of larger 
> batches is from just after the outage, where producers have a back log to 
> processes when the partition becomes available, and then things return to 
> normal batch sizes again once the back log clears.
> We did also look through the Kafka's application logs to try and piece 
> together the series of events leading up to this. Here’s what we know 
> happened, with regards to one partition that has issues, from the logs:
> Prior to outage:
> * 

[jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2016-09-26 Thread Andy Coates (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522583#comment-15522583
 ] 

Andy Coates commented on KAFKA-3919:


We experienced a similar incident again recently. This time the cluster was 
destabilised when there was intermittent connectivity issues with ZK for a few 
minutes. This resulted in a quick succession of Controller elections and 
changes, and ISR shrinks and grows.  During the ISR shrinks and grows some 
brokers seem to have got themselves into the same inconsistent state as above 
and halted. To recover the brokers we needed to manually delete the corrupted 
log segments.

So it looks like the above issue is not related, or not just related, to 
ungraceful shutdown. This inconsistent state appears to be possible during ISR 
changes where producers are producing with acks=1. Though obviously the ZK 
connectivity issues will likely have played a role.


> Broker faills to start after ungraceful shutdown due to non-monotonically 
> incrementing offsets in logs
> --
>
> Key: KAFKA-3919
> URL: https://issues.apache.org/jira/browse/KAFKA-3919
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Andy Coates
>
> Hi All,
> I encountered an issue with Kafka following a power outage that saw a 
> proportion of our cluster disappear. When the power came back on several 
> brokers halted on start up with the error:
> {noformat}
>   Fatal error during KafkaServerStartable startup. Prepare to shutdown”
>   kafka.common.InvalidOffsetException: Attempt to append an offset 
> (1239742691) to position 35728 no larger than the last offset appended 
> (1239742822) to 
> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
>   at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
>   at kafka.log.LogSegment.recover(LogSegment.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at kafka.log.Log.loadSegments(Log.scala:160)
>   at kafka.log.Log.(Log.scala:90)
>   at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The only way to recover the brokers was to delete the log files that 
> contained non monotonically incrementing offsets.
> We've spent some time digging through the logs and I feel I may have worked 
> out the sequence of events leading to this issue, (though this is based on 
> some assumptions I've made about the way Kafka is working, which may be 
> wrong).
> First off, we have unclean leadership elections disable. (We did later enable 
> them to help get around some other issues we were having, but this was 
> several hours after this issue manifested), and we're producing to the topic 
> with gzip compression and acks=1
> We looked through the data logs that were causing the brokers to not start. 
> What we found the initial part of the log has monotonically increasing 
> offset, where each compressed batch normally contained one or two records. 
> Then the is a batch that contains many records, whose first records have an 
> offset below the previous batch and whose last record has an offset above the 
> previous batch. Following on from this there continues a period of large 
> batches, with monotonically increasing offsets, and then the log returns to 
> batches with one or two records.
> Our working assumption here is that the period before the offset dip, with 
> the small batches, is pre-outage normal 

[jira] [Resolved] (KAFKA-3404) Issues running the kafka system test Sasl

2016-09-26 Thread Mickael Maison (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-3404.
---
Resolution: Cannot Reproduce

> Issues running the kafka system test Sasl
> -
>
> Key: KAFKA-3404
> URL: https://issues.apache.org/jira/browse/KAFKA-3404
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Affects Versions: 0.9.0.1
> Environment: Intel x86_64
> Ubuntu 14.04
>Reporter: Mickael Maison
>
> Hi,
> I'm trying to run the test_console_consumer.py system test and it's failing 
> while testing the SASL protocols. 
> [INFO  - 2016-03-15 14:41:58,533 - runner - log - lineno:211]: 
> SerialTestRunner: 
> kafkatest.sanity_checks.test_console_consumer.ConsoleConsumerTest.test_lifecycle.security_protocol=SASL_SSL:
>  Summary: Kafka server didn't finish startup
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner.py", 
> line 102, in run_all_tests
> result.data = self.run_single_test()
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner.py", 
> line 154, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 
> 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/mickael/ibm/messagehub/kafka/tests/kafkatest/sanity_checks/test_console_consumer.py",
>  line 54, in test_lifecycle
> self.kafka.start()
>   File 
> "/home/mickael/ibm/messagehub/kafka/tests/kafkatest/services/kafka/kafka.py", 
> line 81, in start
> Service.start(self)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/services/service.py", 
> line 140, in start
> self.start_node(node)
>   File 
> "/home/mickael/ibm/messagehub/kafka/tests/kafkatest/services/kafka/kafka.py", 
> line 124, in start_node
> monitor.wait_until("Kafka Server.*started", timeout_sec=30, 
> err_msg="Kafka server didn't finish startup")
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/cluster/remoteaccount.py", 
> line 303, in wait_until
> return wait_until(lambda: self.acct.ssh("tail -c +%d %s | grep '%s'" % 
> (self.offset+1, self.log, pattern), allow_fail=True) == 0, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 
> 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Kafka server didn't finish startup
> Looking at the logs from the kafka worker, I can see that Kafka is not able 
> to connect the the kerberos server:
> [2016-03-15 14:41:28,751] FATAL [Kafka Server 1], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Connection refused
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:74)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:60)
>   at kafka.network.Processor.(SocketServer.scala:379)
>   at 
> kafka.network.SocketServer$$anonfun$startup$1$$anonfun$apply$1.apply$mcVI$sp(SocketServer.scala:96)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at 
> kafka.network.SocketServer$$anonfun$startup$1.apply(SocketServer.scala:95)
>   at 
> kafka.network.SocketServer$$anonfun$startup$1.apply(SocketServer.scala:91)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
>   at kafka.network.SocketServer.startup(SocketServer.scala:91)
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:179)
>   at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
>   at kafka.Kafka$.main(Kafka.scala:67)
>   at kafka.Kafka.main(Kafka.scala)
> Looking at the kerberos worker, I can see it was started fine:
> Standalone MiniKdc Running
> ---
>   Realm   : EXAMPLE.COM
>   Running at  : worker4:worker4
>   krb5conf: /mnt/minikdc/krb5.conf
>   created keytab  : /mnt/minikdc/keytab
>   with principals : [client, kafka/worker2]
>  Do  or kill  to stop it
> ---
> Running netstat on the kerberos worker, I can see that it's listening on 
> 47385:
> vagrant@worker4:~$ netstat -ano
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address   Foreign Address State 
>   Timer
> tcp0  0 0.0.0.0:111 0.0.0.0:*   LISTEN
>   off (0.00/0/0)
> tcp0   

[jira] [Commented] (KAFKA-3404) Issues running the kafka system test Sasl

2016-09-26 Thread Mickael Maison (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522523#comment-15522523
 ] 

Mickael Maison commented on KAFKA-3404:
---

Reran the system tests on Ubuntu 16.04 and didn't hit the issue. Marking as 
resolved

> Issues running the kafka system test Sasl
> -
>
> Key: KAFKA-3404
> URL: https://issues.apache.org/jira/browse/KAFKA-3404
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Affects Versions: 0.9.0.1
> Environment: Intel x86_64
> Ubuntu 14.04
>Reporter: Mickael Maison
>
> Hi,
> I'm trying to run the test_console_consumer.py system test and it's failing 
> while testing the SASL protocols. 
> [INFO  - 2016-03-15 14:41:58,533 - runner - log - lineno:211]: 
> SerialTestRunner: 
> kafkatest.sanity_checks.test_console_consumer.ConsoleConsumerTest.test_lifecycle.security_protocol=SASL_SSL:
>  Summary: Kafka server didn't finish startup
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner.py", 
> line 102, in run_all_tests
> result.data = self.run_single_test()
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner.py", 
> line 154, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 
> 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/mickael/ibm/messagehub/kafka/tests/kafkatest/sanity_checks/test_console_consumer.py",
>  line 54, in test_lifecycle
> self.kafka.start()
>   File 
> "/home/mickael/ibm/messagehub/kafka/tests/kafkatest/services/kafka/kafka.py", 
> line 81, in start
> Service.start(self)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/services/service.py", 
> line 140, in start
> self.start_node(node)
>   File 
> "/home/mickael/ibm/messagehub/kafka/tests/kafkatest/services/kafka/kafka.py", 
> line 124, in start_node
> monitor.wait_until("Kafka Server.*started", timeout_sec=30, 
> err_msg="Kafka server didn't finish startup")
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/cluster/remoteaccount.py", 
> line 303, in wait_until
> return wait_until(lambda: self.acct.ssh("tail -c +%d %s | grep '%s'" % 
> (self.offset+1, self.log, pattern), allow_fail=True) == 0, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 
> 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Kafka server didn't finish startup
> Looking at the logs from the kafka worker, I can see that Kafka is not able 
> to connect the the kerberos server:
> [2016-03-15 14:41:28,751] FATAL [Kafka Server 1], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Connection refused
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:74)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:60)
>   at kafka.network.Processor.(SocketServer.scala:379)
>   at 
> kafka.network.SocketServer$$anonfun$startup$1$$anonfun$apply$1.apply$mcVI$sp(SocketServer.scala:96)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at 
> kafka.network.SocketServer$$anonfun$startup$1.apply(SocketServer.scala:95)
>   at 
> kafka.network.SocketServer$$anonfun$startup$1.apply(SocketServer.scala:91)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
>   at kafka.network.SocketServer.startup(SocketServer.scala:91)
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:179)
>   at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
>   at kafka.Kafka$.main(Kafka.scala:67)
>   at kafka.Kafka.main(Kafka.scala)
> Looking at the kerberos worker, I can see it was started fine:
> Standalone MiniKdc Running
> ---
>   Realm   : EXAMPLE.COM
>   Running at  : worker4:worker4
>   krb5conf: /mnt/minikdc/krb5.conf
>   created keytab  : /mnt/minikdc/keytab
>   with principals : [client, kafka/worker2]
>  Do  or kill  to stop it
> ---
> Running netstat on the kerberos worker, I can see that it's listening on 
> 47385:
> vagrant@worker4:~$ netstat -ano
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address   Foreign Address State 
>   Timer
> tcp0  0 

Re: Help related to KStreams and Stateful event processing.

2016-09-26 Thread Michael Noll
Arvind,

your use cases sound very similar to what Bobby Calderwell recently
described and presented at StrangeLoop this year:

Commander: Better Distributed Applications through CQRS, Event Sourcing,
and Immutable Logs
https://speakerdeck.com/bobbycalderwood/commander-better-distributed-applications-through-cqrs-event-sourcing-and-immutable-logs
(The link also includes pointers to the recorded talk and example code.)

See also
http://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/
for a higher-level introduction into such architectures.

Hope this helps!
Michael





On Mon, Sep 26, 2016 at 11:02 AM, Michael Noll  wrote:

> PS: Arvind, I think typically such questions should be sent to the
> kafka-user mailing list (not to kafka-dev).
>
> On Mon, Sep 26, 2016 at 3:41 AM, Matthias J. Sax 
> wrote:
>
>> Hi Arvind,
>>
>> short answer, Kafka Streams does definitely help you!
>>
>> Long answer, Kafka Streams offers two layer to program your stream
>> processing job. The low-level Processor API and the high level DSL.
>> Please check the documentation to get further details:
>> http://docs.confluent.io/3.0.1/streams/index.html
>>
>> With Processor API you are able to do anything -- on the cost of lower
>> abstraction and thus more coding. I guess, this would be there best way
>> for your use case to program your Kafka Streams application.
>>
>> The DSL is easier to use and provides high level abstractions --
>> however, I am not sure if it covers what you need in your use case. But
>> maybe it's worth to try it out before using Processor API...
>>
>> For your second question, I would recommend to use Processor API an
>> attach a state store to a processor node and write to your data
>> warehouse whenever a state is "complete" (see
>> http://docs.confluent.io/3.0.1/streams/developer-guide.html#
>> defining-a-state-store).
>>
>> One more hint: you can actually mix DSL and Processor API by using (eg.
>> process() or transform() within DSL).
>>
>>
>> Hope this gives you some initial pointers. Please follow up if you have
>> more questions.
>>
>> -Matthias
>>
>>
>> On 09/24/2016 11:25 PM, Arvind Kalyan wrote:
>> > Hello there!
>> >
>> > I read about Kafka Streams recently, pretty interesting the way it
>> solves
>> > the stream processing problem in a more cleaner way with less overheads
>> and
>> > complexities.
>> >
>> > I work as a Software Engineer in a startup, and we are in the design
>> stage
>> > for building a stream processing pipeline (if you will) for the
>> millions of
>> > events we get every day. We use Kafka (cluster) as the log aggregation
>> > layer already in production a 5-6 months back and very happy about it.
>> >
>> > I went through a few confluent blogs (by Jay, Neha) as to how KStreams
>> > solve for sort of a state-ful event processing, and maybe I missed the
>> > whole point in this regard, I have some doubts.
>> >
>> > We have use-cases like the following:
>> >
>> > There is an event E1, which is sort-of the base event after which we
>> have a
>> > lot of sub- events E2,E3..En enriching E1 with lot of extra properties
>> > (with considerable delay, say 30-40 mins).
>> >
>> > Eg. 1: An order event has come in where the user has ordered an item on
>> our
>> > website (This is the base event). After say 30-40 minutes, we get events
>> > like packaging_time, shipping_time, delivered_time or cancelled_time etc
>> > related to that order (These are the sub-events).
>> >
>> > So before we get the whole event to a warehouse, we need to collect all
>> > these (ordered, packaged, shipped, cancelled/delivered), and whenever I
>> get
>> > a cancelled or delivered event for an order, I know that completes the
>> > lifecycle for that order, and can put it in the warehouse.
>> >
>> > Eg. 2: User login events - If we are to capture events like
>> User-Logged-In,
>> > User-Logged-Out, I need it to be in the warehouse as a single row. Eg.
>> row
>> > would have these columns *user_id, login_time, logout_time*. So as and
>> when
>> > I receive a logout event (and if I have login event stored in some
>> store),
>> > there would be a trigger which combines both, and send it across to the
>> > warehouse.
>> >
>> > All these involve storing the state of the events and act as-and-when
>> > another event (that completes lifecycle) occurs, after which you have a
>> > trigger for further steps (warehouse or anything else).
>> >
>> > Does KStream help me do this? If not, how should I go about solving this
>> > problem?
>> >
>> > Also, I wanted some advice as to whether it is a standard practice to
>> > aggregate like this and *then* store to warehouse, or should I append
>> each
>> > event into the warehouse and do sort-of an ELT on that using the
>> warehouse?
>> > (Run a query to re-structure the data in the database and store it off
>> as a
>> > separate table)
>> >
>> > Eagerly waiting for your reply,
>> > Arvind
>> >
>>
>>

Re: Help related to KStreams and Stateful event processing.

2016-09-26 Thread Michael Noll
PS: Arvind, I think typically such questions should be sent to the
kafka-user mailing list (not to kafka-dev).

On Mon, Sep 26, 2016 at 3:41 AM, Matthias J. Sax 
wrote:

> Hi Arvind,
>
> short answer, Kafka Streams does definitely help you!
>
> Long answer, Kafka Streams offers two layer to program your stream
> processing job. The low-level Processor API and the high level DSL.
> Please check the documentation to get further details:
> http://docs.confluent.io/3.0.1/streams/index.html
>
> With Processor API you are able to do anything -- on the cost of lower
> abstraction and thus more coding. I guess, this would be there best way
> for your use case to program your Kafka Streams application.
>
> The DSL is easier to use and provides high level abstractions --
> however, I am not sure if it covers what you need in your use case. But
> maybe it's worth to try it out before using Processor API...
>
> For your second question, I would recommend to use Processor API an
> attach a state store to a processor node and write to your data
> warehouse whenever a state is "complete" (see
> http://docs.confluent.io/3.0.1/streams/developer-guide.
> html#defining-a-state-store).
>
> One more hint: you can actually mix DSL and Processor API by using (eg.
> process() or transform() within DSL).
>
>
> Hope this gives you some initial pointers. Please follow up if you have
> more questions.
>
> -Matthias
>
>
> On 09/24/2016 11:25 PM, Arvind Kalyan wrote:
> > Hello there!
> >
> > I read about Kafka Streams recently, pretty interesting the way it solves
> > the stream processing problem in a more cleaner way with less overheads
> and
> > complexities.
> >
> > I work as a Software Engineer in a startup, and we are in the design
> stage
> > for building a stream processing pipeline (if you will) for the millions
> of
> > events we get every day. We use Kafka (cluster) as the log aggregation
> > layer already in production a 5-6 months back and very happy about it.
> >
> > I went through a few confluent blogs (by Jay, Neha) as to how KStreams
> > solve for sort of a state-ful event processing, and maybe I missed the
> > whole point in this regard, I have some doubts.
> >
> > We have use-cases like the following:
> >
> > There is an event E1, which is sort-of the base event after which we
> have a
> > lot of sub- events E2,E3..En enriching E1 with lot of extra properties
> > (with considerable delay, say 30-40 mins).
> >
> > Eg. 1: An order event has come in where the user has ordered an item on
> our
> > website (This is the base event). After say 30-40 minutes, we get events
> > like packaging_time, shipping_time, delivered_time or cancelled_time etc
> > related to that order (These are the sub-events).
> >
> > So before we get the whole event to a warehouse, we need to collect all
> > these (ordered, packaged, shipped, cancelled/delivered), and whenever I
> get
> > a cancelled or delivered event for an order, I know that completes the
> > lifecycle for that order, and can put it in the warehouse.
> >
> > Eg. 2: User login events - If we are to capture events like
> User-Logged-In,
> > User-Logged-Out, I need it to be in the warehouse as a single row. Eg.
> row
> > would have these columns *user_id, login_time, logout_time*. So as and
> when
> > I receive a logout event (and if I have login event stored in some
> store),
> > there would be a trigger which combines both, and send it across to the
> > warehouse.
> >
> > All these involve storing the state of the events and act as-and-when
> > another event (that completes lifecycle) occurs, after which you have a
> > trigger for further steps (warehouse or anything else).
> >
> > Does KStream help me do this? If not, how should I go about solving this
> > problem?
> >
> > Also, I wanted some advice as to whether it is a standard practice to
> > aggregate like this and *then* store to warehouse, or should I append
> each
> > event into the warehouse and do sort-of an ELT on that using the
> warehouse?
> > (Run a query to re-structure the data in the database and store it off
> as a
> > separate table)
> >
> > Eagerly waiting for your reply,
> > Arvind
> >
>
>


[GitHub] kafka pull request #1900: [DOCS]: fix ambiguity - consuming by multiple cons...

2016-09-26 Thread pilloPl
Github user pilloPl closed the pull request at:

https://github.com/apache/kafka/pull/1900


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1900: [DOCS]: fix ambiguity - consuming by multiple cons...

2016-09-26 Thread pilloPl
GitHub user pilloPl reopened a pull request:

https://github.com/apache/kafka/pull/1900

[DOCS]: fix ambiguity - consuming by multiple consumers, but by exact…

In doc it stays:

_"Our topic is divided into a set of totally ordered partitions, each of 
which is consumed by one consumer at any given time."_

And consumer is described as: 

_"We'll call **processes** that subscribe to topics and process the feed of 
published messages **consumers**."_

Which might lead to a wrong conclusion - that each partition can be read by 
one process at any given time.

I think this statements misses information about **consumer groups**, so i 
propose:

_"Our topic is divided into a set of totally ordered partitions, each of 
which is consumed by exactly one consumer (from each subscribed consumer 
groups) at any given time"_

This contribution is my original work and I license the work to the project 
under the project's open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pilloPl/kafka minor/doc-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1900.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1900


commit 2280b5681313540c21c3f30a1c7afd37e6943b3a
Author: pilo 
Date:   2016-09-22T12:37:15Z

fix ambiguity in docs - consuming by multiple consumers, but by exactly  
one from each given consumer groups




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4216) Replication Quotas: Control Leader & Follower Throttled Replicas Separately

2016-09-26 Thread Ben Stopford (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Stopford updated KAFKA-4216:

Summary: Replication Quotas: Control Leader & Follower Throttled Replicas 
Separately  (was: Replication Throttling: Leader may not be throttled if it is 
not "moving")

> Replication Quotas: Control Leader & Follower Throttled Replicas Separately
> ---
>
> Key: KAFKA-4216
> URL: https://issues.apache.org/jira/browse/KAFKA-4216
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Ben Stopford
>Assignee: Ben Stopford
>
> The current algorithm in kafka-reassign-partitions applies a throttle to all 
> moving replicas, be they leader-side or follower-side. 
> A preferable solution would be to change the throttled replica list to 
> specify whether the throttle applies to leader or follower. That way we can 
> ensure that the regular replication will not be throttled.  
> To do this we should change the way the throttled replica list is specified 
> so it is spread over two separate properties. One that corresponds to the 
> leader-side throttle, and the other that corresponds to the follower-side 
> throttle.
> quota.leader.replication.throttled.replicas = 
> [partId]:[replica],[partId]:[replica],[partId]:[replica] 
> quota.follower.replication.throttled.replicas = 
> [partId]:[replica],[partId]:[replica],[partId]:[replica] 
> Then, when applying the throttle, the leader quota can be applied to all 
> current replicas, and the follower quota can be applied only to the new 
> replicas we are creating as part of the rebalance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)