[jira] [Updated] (KAFKA-3378) Client blocks forever if SocketChannel connects instantly

2016-03-20 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-3378:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Client blocks forever if SocketChannel connects instantly
> -
>
> Key: KAFKA-3378
> URL: https://issues.apache.org/jira/browse/KAFKA-3378
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Larkin Lowrey
>Assignee: Larkin Lowrey
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Observed that some consumers were blocked in Fetcher.listOffset() when 
> starting many dozens of consumer threads at the same time.
> Selector.connect(...) calls SocketChannel.connect() in non-blocking mode and 
> assumes that false is always returned and that the channel will be in the 
> Selector's readyKeys once the connection is ready for connect completion due 
> to the OP_CONNECT interest op.
> When connect() returns true the channel is fully connected connected and will 
> not be included in readyKeys since only OP_CONNECT is set.
> I implemented a fix which handles the case when connect(...) returns true and 
> verified that I no longer see stuck consumers. A git pull request will be 
> forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1173) Using Vagrant to get up and running with Apache Kafka

2016-03-20 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203751#comment-15203751
 ] 

Gwen Shapira commented on KAFKA-1173:
-

[~ewencp] Does it make sense to copy vagrant instructions to our documentation? 
I think it will increase visibility for searches.

> Using Vagrant to get up and running with Apache Kafka
> -
>
> Key: KAFKA-1173
> URL: https://issues.apache.org/jira/browse/KAFKA-1173
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Joe Stein
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-1173-JMX.patch, KAFKA-1173.patch, 
> KAFKA-1173_2013-12-07_12:07:55.patch, KAFKA-1173_2014-11-11_13:50:55.patch, 
> KAFKA-1173_2014-11-12_11:32:09.patch, KAFKA-1173_2014-11-18_16:01:33.patch
>
>
> Vagrant has been getting a lot of pickup in the tech communities.  I have 
> found it very useful for development and testing and working with a few 
> clients now using it to help virtualize their environments in repeatable ways.
> Using Vagrant to get up and running.
> For 0.8.0 I have a patch on github https://github.com/stealthly/kafka
> 1) Install Vagrant [http://www.vagrantup.com/](http://www.vagrantup.com/)
> 2) Install Virtual Box 
> [https://www.virtualbox.org/](https://www.virtualbox.org/)
> In the main kafka folder
> 1) ./sbt update
> 2) ./sbt package
> 3) ./sbt assembly-package-dependency
> 4) vagrant up
> once this is done 
> * Zookeeper will be running 192.168.50.5
> * Broker 1 on 192.168.50.10
> * Broker 2 on 192.168.50.20
> * Broker 3 on 192.168.50.30
> When you are all up and running you will be back at a command brompt.  
> If you want you can login to the machines using vagrant shh  but 
> you don't need to.
> You can access the brokers and zookeeper by their IP
> e.g.
> bin/kafka-console-producer.sh --broker-list 
> 192.168.50.10:9092,192.168.50.20:9092,192.168.50.30:9092 --topic sandbox
> bin/kafka-console-consumer.sh --zookeeper 192.168.50.5:2181 --topic sandbox 
> --from-beginning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #461

2016-03-20 Thread Apache Jenkins Server
See 

Changes:

[junrao] MINOR: Fix FetchRequest.getErrorResponse for version 1

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-us1 (Ubuntu ubuntu ubuntu-us golang-ppa) in 
workspace 
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 4f048c4f194a90ded5f0df35e4e23379272d5bc6 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4f048c4f194a90ded5f0df35e4e23379272d5bc6
 > git rev-list 95eabc8c8b383af84466d4c2cfafd0920e5a52ee # timeout=10
Setting 
JDK1_8_0_45_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_45
Setting GRADLE_2_4_RC_2_HOME=
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson832369718185749045.sh
+ /bin/gradle
/tmp/hudson832369718185749045.sh: line 2: /bin/gradle: No such file or directory
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
JDK1_8_0_45_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_45
Setting GRADLE_2_4_RC_2_HOME=
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Setting 
JDK1_8_0_45_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_45
Setting GRADLE_2_4_RC_2_HOME=


Build failed in Jenkins: kafka-trunk-jdk7 #1131

2016-03-20 Thread Apache Jenkins Server
See 

Changes:

[junrao] KAFKA-3378; Follow-up to ensure we `finishConnect` for immediately

[junrao] MINOR: Fix FetchRequest.getErrorResponse for version 1

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H11 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 4f048c4f194a90ded5f0df35e4e23379272d5bc6 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4f048c4f194a90ded5f0df35e4e23379272d5bc6
 > git rev-list c188a68e2b487191f1f3004e22b68c21e26c3f2e # timeout=10
Setting GRADLE_2_4_RC_2_HOME=
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson1107474416921505756.sh
+ /bin/gradle
/tmp/hudson1107474416921505756.sh: line 2: /bin/gradle: No such file or 
directory
Build step 'Execute shell' marked build as failure
Recording test results
Setting GRADLE_2_4_RC_2_HOME=
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 3 days 2 hr old

Setting GRADLE_2_4_RC_2_HOME=
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51


[jira] [Commented] (KAFKA-3378) Client blocks forever if SocketChannel connects instantly

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203660#comment-15203660
 ] 

ASF GitHub Bot commented on KAFKA-3378:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1103


> Client blocks forever if SocketChannel connects instantly
> -
>
> Key: KAFKA-3378
> URL: https://issues.apache.org/jira/browse/KAFKA-3378
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Larkin Lowrey
>Assignee: Larkin Lowrey
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Observed that some consumers were blocked in Fetcher.listOffset() when 
> starting many dozens of consumer threads at the same time.
> Selector.connect(...) calls SocketChannel.connect() in non-blocking mode and 
> assumes that false is always returned and that the channel will be in the 
> Selector's readyKeys once the connection is ready for connect completion due 
> to the OP_CONNECT interest op.
> When connect() returns true the channel is fully connected connected and will 
> not be included in readyKeys since only OP_CONNECT is set.
> I implemented a fix which handles the case when connect(...) returns true and 
> verified that I no longer see stuck consumers. A git pull request will be 
> forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3378; Follow-up to ensure we `finishConn...

2016-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1103


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3428) Remove metadata sync bottleneck from mirrormaker's producer

2016-03-20 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203615#comment-15203615
 ] 

Maysam Yabandeh commented on KAFKA-3428:


I believe volatile variable takes care of the Metadata.update method that can 
update the pointer to this.cluster. I am not sure how the Cluster object itself 
having an update method could cause an issue. In the current implementation, 
after invoking fetch(), the callee has the pointer to a Cluster object which 
might be replaced with a concurrent invocation of Metadata.update(), while the 
callee is still processing the Cluster object. This would still still be the 
case after the non-synchronized fetch returns the volatile this.cluster. Can 
you explain more about unwanted cases that Cluster.update might cause problem 
when cluster becomes a volatile variable?

> Remove metadata sync bottleneck from mirrormaker's producer
> ---
>
> Key: KAFKA-3428
> URL: https://issues.apache.org/jira/browse/KAFKA-3428
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Maysam Yabandeh
>
> Due to sync on the single producer, MM in a setup with 32 consumer threads 
> could not send more than 
> 358k msg/sec hence not being able to saturate the NIC. Profiling showed the 
> producer.send takes 0.080 ms in average, which explains the bottleneck of 
> 358k msg/sec. The following explains the bottleneck in producer.send and 
> suggests how to improve it.
> Current impl of MM relies on a single reducer. For EACH message, the 
> producer.send() calls waitOnMetadata which runs the following synchronized 
> method
> {code}
> // add topic to metadata topic list if it is not there already.
> if (!this.metadata.containsTopic(topic))
> this.metadata.add(topic);
> {code}
> Although the code is mostly noop, since containsTopic is synchronized it 
> becomes the bottleneck in MM.
> Profiling highlights this bottleneck:
> {code}
> 100.0% - 65,539 ms kafka.tools.MirrorMaker$MirrorMakerThread.run
>   18.9% - 12,403 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   13.8% - 9,056 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   12.1% - 7,933 ms org.apache.kafka.clients.Metadata.containsTopic
>   1.7% - 1,088 ms org.apache.kafka.clients.Metadata.fetch
>   2.6% - 1,729 ms org.apache.kafka.clients.Metadata.fetch
>   2.2% - 1,442 ms 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> {code}
> After replacing this bottleneck with a kind of noop, another run of the 
> profiler shows that fetch is the next bottleneck:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   132 s (54 %)   n/a n/a
>   java.lang.Thread.run 50,776 ms (21 %)   n/a n/a
>   org.apache.kafka.clients.Metadata.fetch  20,881 ms (8 %)n/a 
> n/a
>   6.8% - 16,546 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   6.8% - 16,546 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   6.8% - 16,546 ms kafka.tools.MirrorMaker$MirrorMakerProducer.send
> {code}
> however the fetch method does not need to be syncronized
> {code}
> public synchronized Cluster fetch() {
> return this.cluster;
> }
> {code}
> removing sync from the fetch method shows that bottleneck is disappeared:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   249 s (78 %)   n/a n/a
>   org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel  
>  24,489 ms (7 %)n/a n/a
>   org.xerial.snappy.SnappyNative.rawUncompress 17,024 ms (5 %)
> n/a n/a
>   org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>  13,817 ms (4 %)n/a n/a
>   4.3% - 13,817 ms org.apache.kafka.clients.producer.KafkaProducer.send
> {code}
> Internally we have applied a patch to remove this bottleneck. The patch does 
> the following:
> 1. replace HashSet with a concurrent hash set
> 2. remove sync from containsTopic and fetch
> 3. pass a replica of topics to getClusterForCurrentTopics since this 
> synchronized method access topics at two locations and topics being hanged in 
> the middle might mess with the semantics.
> Any interest in applying this patch? Any alternative suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3428) Remove metadata sync bottleneck from mirrormaker's producer

2016-03-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203603#comment-15203603
 ] 

Ismael Juma commented on KAFKA-3428:


OK, so you mean replacing `synchronized` by `volatile`. That works in cases 
where we only care about memory visibility, but I think some analysis needs to 
be done to verify if that is the case here. Note that `Metadata` has a 
synchronized `update` method and `Cluster` also has an `update` method. I think 
we may be able to remove `Cluster.update` to simplify the reasoning.

> Remove metadata sync bottleneck from mirrormaker's producer
> ---
>
> Key: KAFKA-3428
> URL: https://issues.apache.org/jira/browse/KAFKA-3428
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Maysam Yabandeh
>
> Due to sync on the single producer, MM in a setup with 32 consumer threads 
> could not send more than 
> 358k msg/sec hence not being able to saturate the NIC. Profiling showed the 
> producer.send takes 0.080 ms in average, which explains the bottleneck of 
> 358k msg/sec. The following explains the bottleneck in producer.send and 
> suggests how to improve it.
> Current impl of MM relies on a single reducer. For EACH message, the 
> producer.send() calls waitOnMetadata which runs the following synchronized 
> method
> {code}
> // add topic to metadata topic list if it is not there already.
> if (!this.metadata.containsTopic(topic))
> this.metadata.add(topic);
> {code}
> Although the code is mostly noop, since containsTopic is synchronized it 
> becomes the bottleneck in MM.
> Profiling highlights this bottleneck:
> {code}
> 100.0% - 65,539 ms kafka.tools.MirrorMaker$MirrorMakerThread.run
>   18.9% - 12,403 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   13.8% - 9,056 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   12.1% - 7,933 ms org.apache.kafka.clients.Metadata.containsTopic
>   1.7% - 1,088 ms org.apache.kafka.clients.Metadata.fetch
>   2.6% - 1,729 ms org.apache.kafka.clients.Metadata.fetch
>   2.2% - 1,442 ms 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> {code}
> After replacing this bottleneck with a kind of noop, another run of the 
> profiler shows that fetch is the next bottleneck:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   132 s (54 %)   n/a n/a
>   java.lang.Thread.run 50,776 ms (21 %)   n/a n/a
>   org.apache.kafka.clients.Metadata.fetch  20,881 ms (8 %)n/a 
> n/a
>   6.8% - 16,546 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   6.8% - 16,546 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   6.8% - 16,546 ms kafka.tools.MirrorMaker$MirrorMakerProducer.send
> {code}
> however the fetch method does not need to be syncronized
> {code}
> public synchronized Cluster fetch() {
> return this.cluster;
> }
> {code}
> removing sync from the fetch method shows that bottleneck is disappeared:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   249 s (78 %)   n/a n/a
>   org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel  
>  24,489 ms (7 %)n/a n/a
>   org.xerial.snappy.SnappyNative.rawUncompress 17,024 ms (5 %)
> n/a n/a
>   org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>  13,817 ms (4 %)n/a n/a
>   4.3% - 13,817 ms org.apache.kafka.clients.producer.KafkaProducer.send
> {code}
> Internally we have applied a patch to remove this bottleneck. The patch does 
> the following:
> 1. replace HashSet with a concurrent hash set
> 2. remove sync from containsTopic and fetch
> 3. pass a replica of topics to getClusterForCurrentTopics since this 
> synchronized method access topics at two locations and topics being hanged in 
> the middle might mess with the semantics.
> Any interest in applying this patch? Any alternative suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk7 #1130

2016-03-20 Thread Apache Jenkins Server
See 

Changes:

[me] MINOR: Add vagrant up wrapper for simple parallel bringup on aws

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H11 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision c188a68e2b487191f1f3004e22b68c21e26c3f2e 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c188a68e2b487191f1f3004e22b68c21e26c3f2e
 > git rev-list bfac36ad0e378b5f39e3889e40a75c5c1fc48fa7 # timeout=10
Setting GRADLE_2_4_RC_2_HOME=
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson6420227377391062609.sh
+ /bin/gradle
/tmp/hudson6420227377391062609.sh: line 2: /bin/gradle: No such file or 
directory
Build step 'Execute shell' marked build as failure
Recording test results
Setting GRADLE_2_4_RC_2_HOME=
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 2 days 23 hr old

Setting GRADLE_2_4_RC_2_HOME=
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51


[jira] [Commented] (KAFKA-3428) Remove metadata sync bottleneck from mirrormaker's producer

2016-03-20 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203578#comment-15203578
 ] 

Maysam Yabandeh commented on KAFKA-3428:


Sure. Will do that.

bq. By the way, what is the reasoning for stating that `Metadata.fetch()` does 
need to be `synchronized`?
because all it does is returning the pointer to this.cluster and it is enough 
for the cluster to be defined volatile to ensure safe access.

> Remove metadata sync bottleneck from mirrormaker's producer
> ---
>
> Key: KAFKA-3428
> URL: https://issues.apache.org/jira/browse/KAFKA-3428
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Maysam Yabandeh
>
> Due to sync on the single producer, MM in a setup with 32 consumer threads 
> could not send more than 
> 358k msg/sec hence not being able to saturate the NIC. Profiling showed the 
> producer.send takes 0.080 ms in average, which explains the bottleneck of 
> 358k msg/sec. The following explains the bottleneck in producer.send and 
> suggests how to improve it.
> Current impl of MM relies on a single reducer. For EACH message, the 
> producer.send() calls waitOnMetadata which runs the following synchronized 
> method
> {code}
> // add topic to metadata topic list if it is not there already.
> if (!this.metadata.containsTopic(topic))
> this.metadata.add(topic);
> {code}
> Although the code is mostly noop, since containsTopic is synchronized it 
> becomes the bottleneck in MM.
> Profiling highlights this bottleneck:
> {code}
> 100.0% - 65,539 ms kafka.tools.MirrorMaker$MirrorMakerThread.run
>   18.9% - 12,403 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   13.8% - 9,056 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   12.1% - 7,933 ms org.apache.kafka.clients.Metadata.containsTopic
>   1.7% - 1,088 ms org.apache.kafka.clients.Metadata.fetch
>   2.6% - 1,729 ms org.apache.kafka.clients.Metadata.fetch
>   2.2% - 1,442 ms 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> {code}
> After replacing this bottleneck with a kind of noop, another run of the 
> profiler shows that fetch is the next bottleneck:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   132 s (54 %)   n/a n/a
>   java.lang.Thread.run 50,776 ms (21 %)   n/a n/a
>   org.apache.kafka.clients.Metadata.fetch  20,881 ms (8 %)n/a 
> n/a
>   6.8% - 16,546 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   6.8% - 16,546 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   6.8% - 16,546 ms kafka.tools.MirrorMaker$MirrorMakerProducer.send
> {code}
> however the fetch method does not need to be syncronized
> {code}
> public synchronized Cluster fetch() {
> return this.cluster;
> }
> {code}
> removing sync from the fetch method shows that bottleneck is disappeared:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   249 s (78 %)   n/a n/a
>   org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel  
>  24,489 ms (7 %)n/a n/a
>   org.xerial.snappy.SnappyNative.rawUncompress 17,024 ms (5 %)
> n/a n/a
>   org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>  13,817 ms (4 %)n/a n/a
>   4.3% - 13,817 ms org.apache.kafka.clients.producer.KafkaProducer.send
> {code}
> Internally we have applied a patch to remove this bottleneck. The patch does 
> the following:
> 1. replace HashSet with a concurrent hash set
> 2. remove sync from containsTopic and fetch
> 3. pass a replica of topics to getClusterForCurrentTopics since this 
> synchronized method access topics at two locations and topics being hanged in 
> the middle might mess with the semantics.
> Any interest in applying this patch? Any alternative suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #459

2016-03-20 Thread Apache Jenkins Server
See 

Changes:

[me] MINOR: Add vagrant up wrapper for simple parallel bringup on aws

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-us1 (Ubuntu ubuntu ubuntu-us golang-ppa) in 
workspace 
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision c188a68e2b487191f1f3004e22b68c21e26c3f2e 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c188a68e2b487191f1f3004e22b68c21e26c3f2e
 > git rev-list bfac36ad0e378b5f39e3889e40a75c5c1fc48fa7 # timeout=10
Setting 
JDK1_8_0_45_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_45
Setting GRADLE_2_4_RC_2_HOME=
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson4330940705045745467.sh
+ /bin/gradle
/tmp/hudson4330940705045745467.sh: line 2: /bin/gradle: No such file or 
directory
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
JDK1_8_0_45_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_45
Setting GRADLE_2_4_RC_2_HOME=
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Setting 
JDK1_8_0_45_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_45
Setting GRADLE_2_4_RC_2_HOME=


[jira] [Commented] (KAFKA-3428) Remove metadata sync bottleneck from mirrormaker's producer

2016-03-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203573#comment-15203573
 ] 

Ismael Juma commented on KAFKA-3428:


By the way, what is the reasoning for stating that `Metadata.fetch()` does need 
to be `synchronized`?

> Remove metadata sync bottleneck from mirrormaker's producer
> ---
>
> Key: KAFKA-3428
> URL: https://issues.apache.org/jira/browse/KAFKA-3428
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Maysam Yabandeh
>
> Due to sync on the single producer, MM in a setup with 32 consumer threads 
> could not send more than 
> 358k msg/sec hence not being able to saturate the NIC. Profiling showed the 
> producer.send takes 0.080 ms in average, which explains the bottleneck of 
> 358k msg/sec. The following explains the bottleneck in producer.send and 
> suggests how to improve it.
> Current impl of MM relies on a single reducer. For EACH message, the 
> producer.send() calls waitOnMetadata which runs the following synchronized 
> method
> {code}
> // add topic to metadata topic list if it is not there already.
> if (!this.metadata.containsTopic(topic))
> this.metadata.add(topic);
> {code}
> Although the code is mostly noop, since containsTopic is synchronized it 
> becomes the bottleneck in MM.
> Profiling highlights this bottleneck:
> {code}
> 100.0% - 65,539 ms kafka.tools.MirrorMaker$MirrorMakerThread.run
>   18.9% - 12,403 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   13.8% - 9,056 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   12.1% - 7,933 ms org.apache.kafka.clients.Metadata.containsTopic
>   1.7% - 1,088 ms org.apache.kafka.clients.Metadata.fetch
>   2.6% - 1,729 ms org.apache.kafka.clients.Metadata.fetch
>   2.2% - 1,442 ms 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> {code}
> After replacing this bottleneck with a kind of noop, another run of the 
> profiler shows that fetch is the next bottleneck:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   132 s (54 %)   n/a n/a
>   java.lang.Thread.run 50,776 ms (21 %)   n/a n/a
>   org.apache.kafka.clients.Metadata.fetch  20,881 ms (8 %)n/a 
> n/a
>   6.8% - 16,546 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   6.8% - 16,546 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   6.8% - 16,546 ms kafka.tools.MirrorMaker$MirrorMakerProducer.send
> {code}
> however the fetch method does not need to be syncronized
> {code}
> public synchronized Cluster fetch() {
> return this.cluster;
> }
> {code}
> removing sync from the fetch method shows that bottleneck is disappeared:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   249 s (78 %)   n/a n/a
>   org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel  
>  24,489 ms (7 %)n/a n/a
>   org.xerial.snappy.SnappyNative.rawUncompress 17,024 ms (5 %)
> n/a n/a
>   org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>  13,817 ms (4 %)n/a n/a
>   4.3% - 13,817 ms org.apache.kafka.clients.producer.KafkaProducer.send
> {code}
> Internally we have applied a patch to remove this bottleneck. The patch does 
> the following:
> 1. replace HashSet with a concurrent hash set
> 2. remove sync from containsTopic and fetch
> 3. pass a replica of topics to getClusterForCurrentTopics since this 
> synchronized method access topics at two locations and topics being hanged in 
> the middle might mess with the semantics.
> Any interest in applying this patch? Any alternative suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3428) Remove metadata sync bottleneck from mirrormaker's producer

2016-03-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203572#comment-15203572
 ] 

Ismael Juma commented on KAFKA-3428:


Seems like a nice improvement.

Since you have the changes locally, I suggest you submit a pull request as per 
http://kafka.apache.org/contributing.html. The details can then be discussed 
within the pull request.

> Remove metadata sync bottleneck from mirrormaker's producer
> ---
>
> Key: KAFKA-3428
> URL: https://issues.apache.org/jira/browse/KAFKA-3428
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Maysam Yabandeh
>
> Due to sync on the single producer, MM in a setup with 32 consumer threads 
> could not send more than 
> 358k msg/sec hence not being able to saturate the NIC. Profiling showed the 
> producer.send takes 0.080 ms in average, which explains the bottleneck of 
> 358k msg/sec. The following explains the bottleneck in producer.send and 
> suggests how to improve it.
> Current impl of MM relies on a single reducer. For EACH message, the 
> producer.send() calls waitOnMetadata which runs the following synchronized 
> method
> {code}
> // add topic to metadata topic list if it is not there already.
> if (!this.metadata.containsTopic(topic))
> this.metadata.add(topic);
> {code}
> Although the code is mostly noop, since containsTopic is synchronized it 
> becomes the bottleneck in MM.
> Profiling highlights this bottleneck:
> {code}
> 100.0% - 65,539 ms kafka.tools.MirrorMaker$MirrorMakerThread.run
>   18.9% - 12,403 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   13.8% - 9,056 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   12.1% - 7,933 ms org.apache.kafka.clients.Metadata.containsTopic
>   1.7% - 1,088 ms org.apache.kafka.clients.Metadata.fetch
>   2.6% - 1,729 ms org.apache.kafka.clients.Metadata.fetch
>   2.2% - 1,442 ms 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> {code}
> After replacing this bottleneck with a kind of noop, another run of the 
> profiler shows that fetch is the next bottleneck:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   132 s (54 %)   n/a n/a
>   java.lang.Thread.run 50,776 ms (21 %)   n/a n/a
>   org.apache.kafka.clients.Metadata.fetch  20,881 ms (8 %)n/a 
> n/a
>   6.8% - 16,546 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   6.8% - 16,546 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   6.8% - 16,546 ms kafka.tools.MirrorMaker$MirrorMakerProducer.send
> {code}
> however the fetch method does not need to be syncronized
> {code}
> public synchronized Cluster fetch() {
> return this.cluster;
> }
> {code}
> removing sync from the fetch method shows that bottleneck is disappeared:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   249 s (78 %)   n/a n/a
>   org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel  
>  24,489 ms (7 %)n/a n/a
>   org.xerial.snappy.SnappyNative.rawUncompress 17,024 ms (5 %)
> n/a n/a
>   org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>  13,817 ms (4 %)n/a n/a
>   4.3% - 13,817 ms org.apache.kafka.clients.producer.KafkaProducer.send
> {code}
> Internally we have applied a patch to remove this bottleneck. The patch does 
> the following:
> 1. replace HashSet with a concurrent hash set
> 2. remove sync from containsTopic and fetch
> 3. pass a replica of topics to getClusterForCurrentTopics since this 
> synchronized method access topics at two locations and topics being hanged in 
> the middle might mess with the semantics.
> Any interest in applying this patch? Any alternative suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: MINOR: Add vagrant up wrapper for simple paral...

2016-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/982


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3378) Client blocks forever if SocketChannel connects instantly

2016-03-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3378:
---
Status: Patch Available  (was: Reopened)

> Client blocks forever if SocketChannel connects instantly
> -
>
> Key: KAFKA-3378
> URL: https://issues.apache.org/jira/browse/KAFKA-3378
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Larkin Lowrey
>Assignee: Larkin Lowrey
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Observed that some consumers were blocked in Fetcher.listOffset() when 
> starting many dozens of consumer threads at the same time.
> Selector.connect(...) calls SocketChannel.connect() in non-blocking mode and 
> assumes that false is always returned and that the channel will be in the 
> Selector's readyKeys once the connection is ready for connect completion due 
> to the OP_CONNECT interest op.
> When connect() returns true the channel is fully connected connected and will 
> not be included in readyKeys since only OP_CONNECT is set.
> I implemented a fix which handles the case when connect(...) returns true and 
> verified that I no longer see stuck consumers. A git pull request will be 
> forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3378) Client blocks forever if SocketChannel connects instantly

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203570#comment-15203570
 ] 

ASF GitHub Bot commented on KAFKA-3378:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1103

KAFKA-3378; Follow-up to ensure we `finishConnect` for immediately 
connected keys



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka kafka-3378-follow-up

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1103


commit 91858d74752f72675a662c6c869b44e8e443b0e1
Author: Ismael Juma 
Date:   2016-03-20T23:42:32Z

KAFKA-3378; Follow-up to ensure we `finishConnect` for immediately connected




> Client blocks forever if SocketChannel connects instantly
> -
>
> Key: KAFKA-3378
> URL: https://issues.apache.org/jira/browse/KAFKA-3378
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Larkin Lowrey
>Assignee: Larkin Lowrey
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Observed that some consumers were blocked in Fetcher.listOffset() when 
> starting many dozens of consumer threads at the same time.
> Selector.connect(...) calls SocketChannel.connect() in non-blocking mode and 
> assumes that false is always returned and that the channel will be in the 
> Selector's readyKeys once the connection is ready for connect completion due 
> to the OP_CONNECT interest op.
> When connect() returns true the channel is fully connected connected and will 
> not be included in readyKeys since only OP_CONNECT is set.
> I implemented a fix which handles the case when connect(...) returns true and 
> verified that I no longer see stuck consumers. A git pull request will be 
> forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3378; Follow-up to ensure we `finishConn...

2016-03-20 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1103

KAFKA-3378; Follow-up to ensure we `finishConnect` for immediately 
connected keys



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka kafka-3378-follow-up

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1103


commit 91858d74752f72675a662c6c869b44e8e443b0e1
Author: Ismael Juma 
Date:   2016-03-20T23:42:32Z

KAFKA-3378; Follow-up to ensure we `finishConnect` for immediately connected




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3428) Remove metadata sync bottleneck from mirrormaker's producer

2016-03-20 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203568#comment-15203568
 ] 

Maysam Yabandeh commented on KAFKA-3428:


The hardware spec needs an auditing process before shared publicly so I am 
afraid I cannot offer such details at the moment.

After applying the patch, MM could copy the entire data (447k msg/sec) with no 
lagging. The bytes-in rate was 183m per sec. We have not tried it yet with an 
input rate large enough to saturate a 10G NIC and we might discover other 
bottlenecks when doing so.

> Remove metadata sync bottleneck from mirrormaker's producer
> ---
>
> Key: KAFKA-3428
> URL: https://issues.apache.org/jira/browse/KAFKA-3428
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Maysam Yabandeh
>
> Due to sync on the single producer, MM in a setup with 32 consumer threads 
> could not send more than 
> 358k msg/sec hence not being able to saturate the NIC. Profiling showed the 
> producer.send takes 0.080 ms in average, which explains the bottleneck of 
> 358k msg/sec. The following explains the bottleneck in producer.send and 
> suggests how to improve it.
> Current impl of MM relies on a single reducer. For EACH message, the 
> producer.send() calls waitOnMetadata which runs the following synchronized 
> method
> {code}
> // add topic to metadata topic list if it is not there already.
> if (!this.metadata.containsTopic(topic))
> this.metadata.add(topic);
> {code}
> Although the code is mostly noop, since containsTopic is synchronized it 
> becomes the bottleneck in MM.
> Profiling highlights this bottleneck:
> {code}
> 100.0% - 65,539 ms kafka.tools.MirrorMaker$MirrorMakerThread.run
>   18.9% - 12,403 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   13.8% - 9,056 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   12.1% - 7,933 ms org.apache.kafka.clients.Metadata.containsTopic
>   1.7% - 1,088 ms org.apache.kafka.clients.Metadata.fetch
>   2.6% - 1,729 ms org.apache.kafka.clients.Metadata.fetch
>   2.2% - 1,442 ms 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> {code}
> After replacing this bottleneck with a kind of noop, another run of the 
> profiler shows that fetch is the next bottleneck:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   132 s (54 %)   n/a n/a
>   java.lang.Thread.run 50,776 ms (21 %)   n/a n/a
>   org.apache.kafka.clients.Metadata.fetch  20,881 ms (8 %)n/a 
> n/a
>   6.8% - 16,546 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   6.8% - 16,546 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   6.8% - 16,546 ms kafka.tools.MirrorMaker$MirrorMakerProducer.send
> {code}
> however the fetch method does not need to be syncronized
> {code}
> public synchronized Cluster fetch() {
> return this.cluster;
> }
> {code}
> removing sync from the fetch method shows that bottleneck is disappeared:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   249 s (78 %)   n/a n/a
>   org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel  
>  24,489 ms (7 %)n/a n/a
>   org.xerial.snappy.SnappyNative.rawUncompress 17,024 ms (5 %)
> n/a n/a
>   org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>  13,817 ms (4 %)n/a n/a
>   4.3% - 13,817 ms org.apache.kafka.clients.producer.KafkaProducer.send
> {code}
> Internally we have applied a patch to remove this bottleneck. The patch does 
> the following:
> 1. replace HashSet with a concurrent hash set
> 2. remove sync from containsTopic and fetch
> 3. pass a replica of topics to getClusterForCurrentTopics since this 
> synchronized method access topics at two locations and topics being hanged in 
> the middle might mess with the semantics.
> Any interest in applying this patch? Any alternative suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-3378) Client blocks forever if SocketChannel connects instantly

2016-03-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reopened KAFKA-3378:


> Client blocks forever if SocketChannel connects instantly
> -
>
> Key: KAFKA-3378
> URL: https://issues.apache.org/jira/browse/KAFKA-3378
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Larkin Lowrey
>Assignee: Larkin Lowrey
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Observed that some consumers were blocked in Fetcher.listOffset() when 
> starting many dozens of consumer threads at the same time.
> Selector.connect(...) calls SocketChannel.connect() in non-blocking mode and 
> assumes that false is always returned and that the channel will be in the 
> Selector's readyKeys once the connection is ready for connect completion due 
> to the OP_CONNECT interest op.
> When connect() returns true the channel is fully connected connected and will 
> not be included in readyKeys since only OP_CONNECT is set.
> I implemented a fix which handles the case when connect(...) returns true and 
> verified that I no longer see stuck consumers. A git pull request will be 
> forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3431) Move `BrokerEndPoint` from `o.a.k.common` to `o.a.k.common.internals`

2016-03-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3431:
---
Status: Patch Available  (was: Open)

> Move `BrokerEndPoint` from `o.a.k.common` to `o.a.k.common.internals`
> -
>
> Key: KAFKA-3431
> URL: https://issues.apache.org/jira/browse/KAFKA-3431
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> As per the following comment, we should move `BrokerEndPoint` from `common` 
> to `common.internals` as it's not public API.
> https://issues.apache.org/jira/browse/KAFKA-2970?focusedCommentId=15157821=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15157821



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3431) Move `BrokerEndPoint` from `o.a.k.common` to `o.a.k.common.internals`

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203567#comment-15203567
 ] 

ASF GitHub Bot commented on KAFKA-3431:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1102

KAFKA-3431; Move `BrokerEndPoint` from `o.a.k.common` to 
`o.a.k.common.internals`

Also included a minor efficiency improvement in `kafka.cluster.EndPoint`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-3431-broker-end-point-internals

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1102


commit 3bd3dabc4cd8684a580d41fe5d816d80876263b1
Author: Ismael Juma 
Date:   2016-03-20T23:27:10Z

KAFKA-3431: Move `BrokerEndPoint` from `o.a.k.common` to 
`o.a.k.common.internals`

commit f49de2dec4831f6497abd3c5028c13af4198c381
Author: Ismael Juma 
Date:   2016-03-20T23:28:05Z

Avoid compiling regex for `kafka.cluster.EndPoint` multiple times




> Move `BrokerEndPoint` from `o.a.k.common` to `o.a.k.common.internals`
> -
>
> Key: KAFKA-3431
> URL: https://issues.apache.org/jira/browse/KAFKA-3431
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> As per the following comment, we should move `BrokerEndPoint` from `common` 
> to `common.internals` as it's not public API.
> https://issues.apache.org/jira/browse/KAFKA-2970?focusedCommentId=15157821=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15157821



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3431; Move `BrokerEndPoint` from `o.a.k....

2016-03-20 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1102

KAFKA-3431; Move `BrokerEndPoint` from `o.a.k.common` to 
`o.a.k.common.internals`

Also included a minor efficiency improvement in `kafka.cluster.EndPoint`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-3431-broker-end-point-internals

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1102


commit 3bd3dabc4cd8684a580d41fe5d816d80876263b1
Author: Ismael Juma 
Date:   2016-03-20T23:27:10Z

KAFKA-3431: Move `BrokerEndPoint` from `o.a.k.common` to 
`o.a.k.common.internals`

commit f49de2dec4831f6497abd3c5028c13af4198c381
Author: Ismael Juma 
Date:   2016-03-20T23:28:05Z

Avoid compiling regex for `kafka.cluster.EndPoint` multiple times




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-3431) Move `BrokerEndPoint` from `o.a.k.common` to `o.a.k.common.internals`

2016-03-20 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-3431:
--

 Summary: Move `BrokerEndPoint` from `o.a.k.common` to 
`o.a.k.common.internals`
 Key: KAFKA-3431
 URL: https://issues.apache.org/jira/browse/KAFKA-3431
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
Assignee: Ismael Juma
Priority: Critical
 Fix For: 0.10.0.0


As per the following comment, we should move `BrokerEndPoint` from `common` to 
`common.internals` as it's not public API.

https://issues.apache.org/jira/browse/KAFKA-2970?focusedCommentId=15157821=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15157821



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2970) Both UpdateMetadataRequest.java and LeaderAndIsrRequest.java have an Endpoint class

2016-03-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-2970.

Resolution: Duplicate

Marking as duplicate of KAFKA-2757.

> Both UpdateMetadataRequest.java and LeaderAndIsrRequest.java have an Endpoint 
> class
> ---
>
> Key: KAFKA-2970
> URL: https://issues.apache.org/jira/browse/KAFKA-2970
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.0
>Reporter: Grant Henke
>Assignee: chen zhu
>
> Both UpdateMetadataRequest.java and LeaderAndIsrRequest.java have an Endpoint 
> class which contain the same information. These should be consolidated for 
> simplicity and inter-opt. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1173) Using Vagrant to get up and running with Apache Kafka

2016-03-20 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203558#comment-15203558
 ] 

Ewen Cheslack-Postava commented on KAFKA-1173:
--

[~sfgower] Those instructions are out of date. The version that was ultimately 
checked in has instructions in vagrant/README.md

> Using Vagrant to get up and running with Apache Kafka
> -
>
> Key: KAFKA-1173
> URL: https://issues.apache.org/jira/browse/KAFKA-1173
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Joe Stein
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-1173-JMX.patch, KAFKA-1173.patch, 
> KAFKA-1173_2013-12-07_12:07:55.patch, KAFKA-1173_2014-11-11_13:50:55.patch, 
> KAFKA-1173_2014-11-12_11:32:09.patch, KAFKA-1173_2014-11-18_16:01:33.patch
>
>
> Vagrant has been getting a lot of pickup in the tech communities.  I have 
> found it very useful for development and testing and working with a few 
> clients now using it to help virtualize their environments in repeatable ways.
> Using Vagrant to get up and running.
> For 0.8.0 I have a patch on github https://github.com/stealthly/kafka
> 1) Install Vagrant [http://www.vagrantup.com/](http://www.vagrantup.com/)
> 2) Install Virtual Box 
> [https://www.virtualbox.org/](https://www.virtualbox.org/)
> In the main kafka folder
> 1) ./sbt update
> 2) ./sbt package
> 3) ./sbt assembly-package-dependency
> 4) vagrant up
> once this is done 
> * Zookeeper will be running 192.168.50.5
> * Broker 1 on 192.168.50.10
> * Broker 2 on 192.168.50.20
> * Broker 3 on 192.168.50.30
> When you are all up and running you will be back at a command brompt.  
> If you want you can login to the machines using vagrant shh  but 
> you don't need to.
> You can access the brokers and zookeeper by their IP
> e.g.
> bin/kafka-console-producer.sh --broker-list 
> 192.168.50.10:9092,192.168.50.20:9092,192.168.50.30:9092 --topic sandbox
> bin/kafka-console-consumer.sh --zookeeper 192.168.50.5:2181 --topic sandbox 
> --from-beginning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3430) Allow users to set key in KTable.toStream() and KStream

2016-03-20 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-3430:


 Summary: Allow users to set key in KTable.toStream() and KStream
 Key: KAFKA-3430
 URL: https://issues.apache.org/jira/browse/KAFKA-3430
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang
 Fix For: 0.10.0.1


Currently KTable.toStream does not take any parameters and hence users who 
wants to set the key need to do two steps:

{code}table.toStream().map(...){code} in order to do so. We can make it in one 
step by providing the mapper parameter in toStream.

And similarly today users usually need to call {code} KStream.map() {code} in 
order to select the key before aggregation-by-key operation if the original 
stream is does not contain keys. We can consider adding a specific function in 
KStream to do so:

{code}KStream.selectKey(mapper){code}

which essential is the same as

{code}KStream.map(mapper, value){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3429) Remove Serdes needed for repartitioning in KTable stateful operations

2016-03-20 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-3429:


 Summary: Remove Serdes needed for repartitioning in KTable 
stateful operations
 Key: KAFKA-3429
 URL: https://issues.apache.org/jira/browse/KAFKA-3429
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang
 Fix For: 0.10.0.1


Currently in KTable stateful operations, we require the users to provide serdes 
(default to configured ones) for repartitioning. However, these are not 
necessary since for all KTable instances either generated from the topics 
directly:

{code}builder.table(...){code}

or from aggregation operations:

{code}stream.aggregate(...){code}

There are already serde provided for materializing the data, and hence the same 
serde can be re-used when the resulted KTable is involved in future stateful 
operations. Implementation-wise we need to carry the serde information along 
with the KTableImpl instance in order to do it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3428) Remove metadata sync bottleneck from mirrormaker's producer

2016-03-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203464#comment-15203464
 ] 

Ismael Juma commented on KAFKA-3428:


Thanks for the report and interesting analysis. What was the impact of your 
changes in terms of throughout? Also, can you please share information on the 
hardware and JDK version?

> Remove metadata sync bottleneck from mirrormaker's producer
> ---
>
> Key: KAFKA-3428
> URL: https://issues.apache.org/jira/browse/KAFKA-3428
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Maysam Yabandeh
>
> Due to sync on the single producer, MM in a setup with 32 consumer threads 
> could not send more than 
> 358k msg/sec hence not being able to saturate the NIC. Profiling showed the 
> producer.send takes 0.080 ms in average, which explains the bottleneck of 
> 358k msg/sec. The following explains the bottleneck in producer.send and 
> suggests how to improve it.
> Current impl of MM relies on a single reducer. For EACH message, the 
> producer.send() calls waitOnMetadata which runs the following synchronized 
> method
> {code}
> // add topic to metadata topic list if it is not there already.
> if (!this.metadata.containsTopic(topic))
> this.metadata.add(topic);
> {code}
> Although the code is mostly noop, since containsTopic is synchronized it 
> becomes the bottleneck in MM.
> Profiling highlights this bottleneck:
> {code}
> 100.0% - 65,539 ms kafka.tools.MirrorMaker$MirrorMakerThread.run
>   18.9% - 12,403 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   13.8% - 9,056 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   12.1% - 7,933 ms org.apache.kafka.clients.Metadata.containsTopic
>   1.7% - 1,088 ms org.apache.kafka.clients.Metadata.fetch
>   2.6% - 1,729 ms org.apache.kafka.clients.Metadata.fetch
>   2.2% - 1,442 ms 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> {code}
> After replacing this bottleneck with a kind of noop, another run of the 
> profiler shows that fetch is the next bottleneck:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   132 s (54 %)   n/a n/a
>   java.lang.Thread.run 50,776 ms (21 %)   n/a n/a
>   org.apache.kafka.clients.Metadata.fetch  20,881 ms (8 %)n/a 
> n/a
>   6.8% - 16,546 ms 
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>   6.8% - 16,546 ms org.apache.kafka.clients.producer.KafkaProducer.send
>   6.8% - 16,546 ms kafka.tools.MirrorMaker$MirrorMakerProducer.send
> {code}
> however the fetch method does not need to be syncronized
> {code}
> public synchronized Cluster fetch() {
> return this.cluster;
> }
> {code}
> removing sync from the fetch method shows that bottleneck is disappeared:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy   249 s (78 %)   n/a n/a
>   org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel  
>  24,489 ms (7 %)n/a n/a
>   org.xerial.snappy.SnappyNative.rawUncompress 17,024 ms (5 %)
> n/a n/a
>   org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>  13,817 ms (4 %)n/a n/a
>   4.3% - 13,817 ms org.apache.kafka.clients.producer.KafkaProducer.send
> {code}
> Internally we have applied a patch to remove this bottleneck. The patch does 
> the following:
> 1. replace HashSet with a concurrent hash set
> 2. remove sync from containsTopic and fetch
> 3. pass a replica of topics to getClusterForCurrentTopics since this 
> synchronized method access topics at two locations and topics being hanged in 
> the middle might mess with the semantics.
> Any interest in applying this patch? Any alternative suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3428) Remove metadata sync bottleneck from mirrormaker's producer

2016-03-20 Thread Maysam Yabandeh (JIRA)
Maysam Yabandeh created KAFKA-3428:
--

 Summary: Remove metadata sync bottleneck from mirrormaker's 
producer
 Key: KAFKA-3428
 URL: https://issues.apache.org/jira/browse/KAFKA-3428
 Project: Kafka
  Issue Type: Improvement
Reporter: Maysam Yabandeh


Due to sync on the single producer, MM in a setup with 32 consumer threads 
could not send more than 
358k msg/sec hence not being able to saturate the NIC. Profiling showed the 
producer.send takes 0.080 ms in average, which explains the bottleneck of 358k 
msg/sec. The following explains the bottleneck in producer.send and suggests 
how to improve it.

Current impl of MM relies on a single reducer. For EACH message, the 
producer.send() calls waitOnMetadata which runs the following synchronized 
method
{code}
// add topic to metadata topic list if it is not there already.
if (!this.metadata.containsTopic(topic))
this.metadata.add(topic);
{code}
Although the code is mostly noop, since containsTopic is synchronized it 
becomes the bottleneck in MM.

Profiling highlights this bottleneck:
{code}
100.0% - 65,539 ms kafka.tools.MirrorMaker$MirrorMakerThread.run
  18.9% - 12,403 ms org.apache.kafka.clients.producer.KafkaProducer.send
  13.8% - 9,056 ms 
org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
  12.1% - 7,933 ms org.apache.kafka.clients.Metadata.containsTopic
  1.7% - 1,088 ms org.apache.kafka.clients.Metadata.fetch
  2.6% - 1,729 ms org.apache.kafka.clients.Metadata.fetch
  2.2% - 1,442 ms 
org.apache.kafka.clients.producer.internals.RecordAccumulator.append
{code}

After replacing this bottleneck with a kind of noop, another run of the 
profiler shows that fetch is the next bottleneck:
{code}
org.xerial.snappy.SnappyNative.arrayCopy 132 s (54 %)   n/a n/a
java.lang.Thread.run 50,776 ms (21 %)   n/a n/a
org.apache.kafka.clients.Metadata.fetch  20,881 ms (8 %)n/a 
n/a
  6.8% - 16,546 ms 
org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
  6.8% - 16,546 ms org.apache.kafka.clients.producer.KafkaProducer.send
  6.8% - 16,546 ms kafka.tools.MirrorMaker$MirrorMakerProducer.send
{code}

however the fetch method does not need to be syncronized
{code}
public synchronized Cluster fetch() {
return this.cluster;
}
{code}
removing sync from the fetch method shows that bottleneck is disappeared:
{code}
org.xerial.snappy.SnappyNative.arrayCopy 249 s (78 %)   n/a n/a
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel  
 24,489 ms (7 %)n/a n/a
org.xerial.snappy.SnappyNative.rawUncompress 17,024 ms (5 %)
n/a n/a
org.apache.kafka.clients.producer.internals.RecordAccumulator.append
 13,817 ms (4 %)n/a n/a
  4.3% - 13,817 ms org.apache.kafka.clients.producer.KafkaProducer.send
{code}

Internally we have applied a patch to remove this bottleneck. The patch does 
the following:
1. replace HashSet with a concurrent hash set
2. remove sync from containsTopic and fetch
3. pass a replica of topics to getClusterForCurrentTopics since this 
synchronized method access topics at two locations and topics being hanged in 
the middle might mess with the semantics.

Any interest in applying this patch? Any alternative suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3427) broker can return incorrect version of fetch response when the broker hits an unknown exception

2016-03-20 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203370#comment-15203370
 ] 

Aditya Auradkar commented on KAFKA-3427:


[~junrao] - Certainly, I can patch 0.9. 

> broker can return incorrect version of fetch response when the broker hits an 
> unknown exception
> ---
>
> Key: KAFKA-3427
> URL: https://issues.apache.org/jira/browse/KAFKA-3427
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Jun Rao
>Assignee: Jun Rao
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> In FetchResponse.handleError(), we generate FetchResponse like the following, 
> which always defaults to version 0 of the response. 
> FetchResponse(correlationId, fetchResponsePartitionData)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3427) broker can return incorrect version of fetch response when the broker hits an unknown exception

2016-03-20 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203351#comment-15203351
 ] 

Jun Rao commented on KAFKA-3427:


[~aauradkar], do you think you could patch this for the 0.9.0 branch as well? 
In addition to patch FetchResponse.handleError(), we also need to patch 
ProduceResponse.handleError() since we added the throttledTime in the response 
and we were still using the old Scala request/response in KafkaApis in 0.9.0. 
Thanks,

> broker can return incorrect version of fetch response when the broker hits an 
> unknown exception
> ---
>
> Key: KAFKA-3427
> URL: https://issues.apache.org/jira/browse/KAFKA-3427
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Jun Rao
>Assignee: Jun Rao
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> In FetchResponse.handleError(), we generate FetchResponse like the following, 
> which always defaults to version 0 of the response. 
> FetchResponse(correlationId, fetchResponsePartitionData)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3415) AdminOperationException when altering Topic with same number of partitions

2016-03-20 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199909#comment-15199909
 ] 

Ashish K Singh commented on KAFKA-3415:
---

I think the tool should be used to bump up partition count, as it states. It 
should be client's responsibility that it uses the tool when the partition 
count has to be actually bumped up. Any reason why you can not perform that 
check?

> AdminOperationException when altering Topic with same number of partitions
> --
>
> Key: KAFKA-3415
> URL: https://issues.apache.org/jira/browse/KAFKA-3415
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Affects Versions: 0.9.0.1
>Reporter: Gérald Quintana
>Priority: Minor
>
> To automate topic creation/modification, we sometimes run kafka-topics.sh 
> script with the same topic config. It raises an AdminOperationException, in 
> short it's idempotent
> {code}
> bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic logfailed
> Topic:logfailed PartitionCount:1ReplicationFactor:1 
> Configs:retention.ms=60480,retention.bytes=209715200
> Topic: logfailedPartition: 0Leader: 1   Replicas: 1   
>   Isr: 1
> bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic logfailed 
> --partitions 1 --config retention.bytes=209715200 --config 
> retention.ms=60480
> WARNING: Altering topic configuration from this script has been deprecated 
> and may be removed in future releases.
>  Going forward, please use kafka-configs.sh for this functionality
> Updated config for topic "logfailed".
> WARNING: If partitions are increased for a topic that has a key, the 
> partition logic or ordering of the messages will be affected
> Error while executing topic command : The number of partitions for a topic 
> can only be increased
> [2016-03-17 12:25:20,458] ERROR kafka.admin.AdminOperationException: The 
> number of partitions for a topic can only be increased
> at kafka.admin.AdminUtils$.addPartitions(AdminUtils.scala:119)
> at 
> kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:139)
> at 
> kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:116)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at kafka.admin.TopicCommand$.alterTopic(TopicCommand.scala:116)
> at kafka.admin.TopicCommand$.main(TopicCommand.scala:62)
> at kafka.admin.TopicCommand.main(TopicCommand.scala)
>  (kafka.admin.TopicCommand$)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3188) Add system test for KIP-31 and KIP-32 - Compatibility Test

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200289#comment-15200289
 ] 

ASF GitHub Bot commented on KAFKA-3188:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1059


> Add system test for KIP-31 and KIP-32 - Compatibility Test
> --
>
> Key: KAFKA-3188
> URL: https://issues.apache.org/jira/browse/KAFKA-3188
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Eno Thereska
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> The integration test should test the compatibility between 0.10.0 broker with 
> clients on older versions. The clients version should include 0.9.0 and 0.8.x.
> We already cover 0.10 brokers with old producers/consumers in upgrade tests. 
> So, the main thing to test is a mix of 0.9 and 0.10 producers and consumers. 
> E.g., test1: 0.9 producer/0.10 consumer and then test2: 0.10 producer/0.9 
> consumer. And then, each of them: compression/no compression (like in upgrade 
> test). And we could probably add another dimension : topic configured with 
> CreateTime (default) and LogAppendTime. So, total 2x2x2 combinations (but 
> maybe can reduce that — eg. do LogAppendTime with compression only).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


KStreams Partition Assignment

2016-03-20 Thread Michael D. Coon
I'm evaluating whether the KafkaStreams API will be something we can use on my 
current project. Namely, we want to be able to distribute the consumers on a 
Mesos/YARN cluster. It's not entirely clear to me in the code what is deciding 
which partitions get assigned at runtime and whether this is intended for a 
distributed application or just a multi-threaded environment. 
I get that the consumer coordinator will get reassignments when group 
participation changes; however, in looking through the StreamPartitionAssignor 
code, it's not clear to me what is happening in the assign method. It looks 
like to me like subscriptions are coming in from the consumer coordinator, 
presumably whose assignments are derived from the lead brokers for the topics 
of interest. Those subscriptions are then translated into co-partitioned groups 
of clients. Once that's complete, it hands off the co-partitioned groups to the 
StreamThread's partitionGrouper to do the work of assigning the partitions to 
each co-partitioned group. The DefaultPartitionGrouper code, starting on line 
57, simply does a 1-up assigning of partition to group. How will this actually 
work with distributed stream consumers if it's always going to be assigning the 
partition as a 1-up sequence local to that particular consumer? Shouldn't it 
use the assigned partition that is coming back from the ConsumerCoordinator? 
I'm struggling to understand the layers but I need to in order to know whether 
this implementation is going to work for us. If the PartitionGroupAssignor's 
default is just meant for single-node multithreaded use, that's fine as long as 
I can inject my own implementation. But I would still need to understand what 
is happening at the StreamPartitionAssignor layer more clearly. Any info, 
design docs, in-progress wiki's would be most appreciated if the answer is too 
in-depth for an email discussion. Thanks and I really love what you guys are 
doing with this!
Mike

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-03-20 Thread Dana Powers
I like it. I think the revised KIP-35 approach is simple, easy to
understand, and workable. I think it will solve the main problem that I
have had, and I assume other 3rd party client devs have had as well, and
should open the door to more backwards-compatible drivers and less
fracturing of client development.

As much as I think a single broker version could be easier to work with,
the conference call discussion convinced me that my thinking of broker
version is not totally aligned with the server development workflow and the
design ideas of the folks that have thought about this much more deeply
than I have.

I think I would make this approach work by looking at the released server
version documentation for each version that I am trying to support and test
against*, manually identify the expected "protocol vectors" each supports,
store that as a map of vectors to "broker versions", check each vector at
runtime until I find a match, and write code compatibility checks from
there.  I admit that I am not sure whether this would work if I was trying
to support trunk deploys where the "fallback" vector may not be as clearly
defined. But I do think that if a client is targeting trunk it probably
isn't also targeting backwards compatibility, and so it likely would have
little reason to use this API anyways (unless for nicer error messages if
particular apis are unsupported on the target broker).

I think Magnus has much deeper insight into this than I do, and I wish I
had time to test a patch this weekend. Hopefully I will be able to get to
that next week. But I'm confident that if it works for librdkafka, it will
work for most if not all other 3rd party clients.

-Dana

* FWIW, every push / pull request to kafka-python gets automatically tested
against a number of public kafka releases via travis-ci. Currently we test
against 0.8.0, 0.8.1.1, 0.8.2.2, and 0.9.0.1 .

On Thu, Mar 17, 2016 at 12:28 PM, Magnus Edenhill 
wrote:

> I dont really see how first checking if the broker supports a certain set
> of API versions
> is so much more complex than calling the same set of API versions during
> operation?
> The only difference is that without the check we might only make it halfway
> through,
> i.e., being able to join a group, consume messages, but then fail to commit
> them
>
> Another concrete example is support for KIP-31 and KIP-32:
> How will a client with support for 0.8.0-0.10.0 know when to use the new
> Message format?
> If a client uses the old Message format (v0) the broker will need to
> convert message formats
> in the slow path which will increase the broker CPU usage and decrease its
> performance.
> Sure, clients can expose a configuration property to let the user enable
> this but most
> users will miss this (there are a lot of config properties already).
>
>
>
>
> 2016-03-17 19:43 GMT+01:00 Gwen Shapira :
>
> > My problem with per-api is that not all combinations are supported. So
> the
> > client will need to figure out which "safe" combination to revert to -
> > which is a lot like global numbers, but less linear...
> >
> >
> > On Thu, Mar 17, 2016 at 11:29 AM, Joel Koshy 
> wrote:
> >
> > > @Gwen - yes it did come back to per-API versions - I think that
> happened
> > > toward the end of the KIP hangout. I already said this, but personally
> > I'm
> > > also in favor of per-API versions. A global/aggregate API version also
> > > works but that is just a proxy to a vector of per-API versions and at
> the
> > > end of the day a client implementation will need to somehow derive
> *some*
> > > per-API version vector to talk to the broker. The other challenges
> (that
> > > I'm sure can be solved) with the global API version are automatically
> > > computing it in the first place or if set manually, automatically
> > verifying
> > > that it was correctly bumped; then there are the nuances of API
> evolution
> > > across multiple branches and thereby the need for some convention to
> > manage
> > > trees of aggregate API versions.
> > >
> > > Jason and Jay had brought up (earlier in this thread) some practical
> > > difficulties in using per-API versions by client implementations that I
> > > think is worth quoting verbatim to make sure people are aware of the
> > > implications:
> > >
> > > Maybe you could try to capture feature
> > > > compatibility by checking the versions for a subset of request types?
> > For
> > > > example, to ensure that you can use the new consumer API, you check
> > that
> > > > the group coordinator request is present, the offset commit request
> > > version
> > > > is greater than 2, the offset fetch request is greater than 1, and
> the
> > > join
> > > > group request is present. And to ensure compatibility with KIP-32,
> > maybe
> > > > you only need to check the appropriate versions of the fetch and
> > produce
> > > > requests. That sounds kind of complicated to keep track of and you
> > > probably
> > > > 

[GitHub] kafka pull request: KAFKA-2832: Add a consumer config option to ex...

2016-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1082


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-2832: Add a consumer config option to ex...

2016-03-20 Thread edoardocomar
GitHub user edoardocomar opened a pull request:

https://github.com/apache/kafka/pull/1082

KAFKA-2832: Add a consumer config option to exclude internal topics

A new consumer config option 'exclude.internal.topics' was added to
allow excluding internal topics when wildcards are used to specify
consumers.
The new option takes a boolean value, with a default 'false' value (i.e.
no exclusion).

This patch is co-authored with @rajinisivaram @edoardocomar @mimaison

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/edoardocomar/kafka KAFKA-2832

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1082.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1082


commit ec62a3ad5e9076880b0e1c53891949f90b06c112
Author: Vahid Hashemian 
Date:   2016-03-09T21:48:12Z

KAFKA-2832: Add a consumer config option to exclude internal topics

A new consumer config option 'exclude.internal.topics' was added to
allow excluding internal topics when wildcards are used to specify
consumers.
The new option takes a boolean value, with a default 'false' value (i.e.
no exclusion).

This patch is co-authored with @rajinisivaram @edoardocomar @mimaison




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3417) Invalid characters in config properties not be validated

2016-03-20 Thread Byron Ruth (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Ruth updated KAFKA-3417:
--
Summary: Invalid characters in config properties not be validated  (was: 
Invalid characters in config not be caught early enough)

> Invalid characters in config properties not be validated
> 
>
> Key: KAFKA-3417
> URL: https://issues.apache.org/jira/browse/KAFKA-3417
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.9.0.1
>Reporter: Byron Ruth
>Priority: Minor
>
> I ran into an error using a {{client.id}} with invalid characters (per the 
> [config 
> validator|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/common/Config.scala#L25-L35]).
>  I was able to get that exact error using the {{kafka-console-consumer}} 
> script, presumably because I supplied a consumer properties file and it 
> validated prior to hitting the server. However, when I use a client library 
> (sarama for Go in this case), an error in the metrics subsystem is thrown 
> [here|https://github.com/apache/kafka/blob/977ebbe9bafb6c1a6e1be69620f745712118fe80/clients/src/main/java/org/apache/kafka/common/metrics/Metrics.java#L380].
> Assuming the cause os related to the invalid characters, when the request 
> header is decoded, the {{clientId}} should be validated prior to being used?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3415) AdminOperationException when altering Topic with same number of partitions

2016-03-20 Thread JIRA
Gérald Quintana created KAFKA-3415:
--

 Summary: AdminOperationException when altering Topic with same 
number of partitions
 Key: KAFKA-3415
 URL: https://issues.apache.org/jira/browse/KAFKA-3415
 Project: Kafka
  Issue Type: Improvement
  Components: admin
Affects Versions: 0.9.0.1
Reporter: Gérald Quintana
Priority: Minor


To automate topic creation/modification, we sometimes run kafka-topics.sh 
script with the same topic config. It raises an AdminOperationException, in 
short it's idempotent

{code}
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic logfailed
Topic:logfailed PartitionCount:1ReplicationFactor:1 
Configs:retention.ms=60480,retention.bytes=209715200
Topic: logfailedPartition: 0Leader: 1   Replicas: 1 
Isr: 1

bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic logfailed 
--partitions 1 --config retention.bytes=209715200 --config 
retention.ms=60480
WARNING: Altering topic configuration from this script has been deprecated and 
may be removed in future releases.
 Going forward, please use kafka-configs.sh for this functionality
Updated config for topic "logfailed".
WARNING: If partitions are increased for a topic that has a key, the partition 
logic or ordering of the messages will be affected
Error while executing topic command : The number of partitions for a topic can 
only be increased
[2016-03-17 12:25:20,458] ERROR kafka.admin.AdminOperationException: The number 
of partitions for a topic can only be increased
at kafka.admin.AdminUtils$.addPartitions(AdminUtils.scala:119)
at 
kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:139)
at 
kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:116)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.admin.TopicCommand$.alterTopic(TopicCommand.scala:116)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:62)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
 (kafka.admin.TopicCommand$)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3410) Unclean leader election and "Halting because log truncation is not allowed"

2016-03-20 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198550#comment-15198550
 ] 

Gwen Shapira commented on KAFKA-3410:
-

1. I'd say "If you just lost all your data on the leader and only ISR member, 
you have no choice but to enable unclean leader election and lose a bunch of 
data. You can avoid getting into this unpleasant spot by setting 
min.insync.replica=2 on topics where you want to avoid data loss in these 
scenarios"

> Unclean leader election and "Halting because log truncation is not allowed"
> ---
>
> Key: KAFKA-3410
> URL: https://issues.apache.org/jira/browse/KAFKA-3410
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>
> I ran into a scenario where one of my brokers would continually shutdown, 
> with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I managed to reproduce it with the following scenario:
> 1. Start broker1, with unclean.leader.election.enable=false
> 2. Start broker2, with unclean.leader.election.enable=false
> 3. Create topic, single partition, with replication-factor 2.
> 4. Write data to the topic.
> 5. At this point, both brokers are in the ISR. Broker1 is the partition 
> leader.
> 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 gets 
> dropped out of ISR. Broker1 is still the leader. I can still write data to 
> the partition.
> 7. Shutdown Broker1. Hard or controlled, doesn't matter.
> 8. rm -rf the log directory of broker1. (This simulates a disk replacement or 
> full hardware replacement)
> 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed 
> because broker1 is down. At this point, the partition is offline. Can't write 
> to it.
> 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 attempts 
> to join ISR, and immediately halts with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I am able to recover by setting unclean.leader.election.enable=true on my 
> brokers.
> I'm trying to understand a couple things:
> * In step 10, why is broker1 allowed to resume leadership even though it has 
> no data?
> * In step 10, why is it necessary to stop the entire broker due to one 
> partition that is in this state? Wouldn't it be possible for the broker to 
> continue to serve traffic for all the other topics, and just mark this one as 
> unavailable?
> * Would it make sense to allow an operator to manually specify which broker 
> they want to become the new master? This would give me more control over how 
> much data loss I am willing to handle. In this case, I would want broker2 to 
> become the new master. Or, is that possible and I just don't know how to do 
> it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Help: Producer with SSL did not work after upgrading the kafka 0.8.2 to Kafka 0.9

2016-03-20 Thread Qi Xu
Hi folks,
We just finished the upgrade from 0.8.2 to 0.9 with the instructions in
Kafka web site (that set the protocol version as 0.8.2.x in Kafka server
0.9).
After the upgrade, we want to try the producer with SSL endpoint, but never
worked. Here's the error:

~/kafka_2.11-0.9.0.0$ ./bin/kafka-console-producer.sh --topic testtopic1
--broker-list  --producer.config  ./config/producer.properties
..
[2016-03-17 01:24:46,481] WARN Error while fetching metadata with
correlation id 0 : {testtopic1=UNKNOWN}
(org.apache.kafka.clients.NetworkClient)
[2016-03-17 01:24:46,613] WARN Error while fetching metadata with
correlation id 1 : {testtopic1=UNKNOWN}
(org.apache.kafka.clients.NetworkClient)
[2016-03-17 01:24:46,759] WARN Error while fetching metadata with
correlation id 2 : {testtopic1=UNKNOWN}
(org.apache.kafka.clients.NetworkClient)
[2016-03-17 01:24:46,901] WARN Error while fetching metadata with
correlation id 3 : {testtopic1=UNKNOWN}
(org.apache.kafka.clients.NetworkClient)
[2016-03-17 01:24:47,046] WARN Error while fetching metadata with
correlation id 4 : {testtopic1=UNKNOWN}
(org.apache.kafka.clients.NetworkClient)


In producer.properties, I specified all security information needed, as
below:

security.protocol=SSL
ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
ssl.keystore.type=JKS
ssl.keystore.location=/usr/share/kafka/config/server.keystore
ssl.keystore.password=Password
ssl.key.password=Password
ssl.truststore.type=JKS
ssl.truststore.location=/usr/share/kafka/config/server.keystore
ssl.truststore.password=Password

In server side, I don't see any obvious error.
Any idea or hint is very appreciated. Thanks a lot.


Thanks,
Qi


[jira] [Commented] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup

2016-03-20 Thread Vamsi Subhash Achanta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199544#comment-15199544
 ] 

Vamsi Subhash Achanta commented on KAFKA-3359:
--

*bump*
Can this be reviewed?

> Parallel log-recovery of un-flushed segments on startup
> ---
>
> Key: KAFKA-3359
> URL: https://issues.apache.org/jira/browse/KAFKA-3359
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.2.2, 0.9.0.1
>Reporter: Vamsi Subhash Achanta
>Assignee: Jay Kreps
> Fix For: 0.10.0.0
>
>
> On startup, currently the log segments within a logDir are loaded 
> sequentially when there is a un-clean shutdown. This will take a lot of time 
> for the segments to be loaded as the logSegment.recover(..) is called for 
> every segment and for brokers which have many partitions, the time taken will 
> be very high (we have noticed ~40mins for 2k partitions).
> https://github.com/apache/kafka/pull/1035
> This pull request will make the log-segment load parallel with two 
> configurable properties "log.recovery.threads" and 
> "log.recovery.max.interval.ms".
> Logic:
> 1. Have a threadpool defined of fixed length (log.recovery.threads)
> 2. Submit the logSegment recovery as a job to the threadpool and add the 
> future returned to a job list
> 3. Wait till all the jobs are done within req. time 
> (log.recovery.max.interval.ms - default set to Long.Max).
> 4. If they are done and the futures are all null (meaning that the jobs are 
> successfully completed), it is considered done.
> 5. If any of the recovery jobs failed, then it is logged and 
> LogRecoveryFailedException is thrown
> 6. If the timeout is reached, LogRecoveryFailedException is thrown.
> The logic is backward compatible with the current sequential implementation 
> as the default thread count is set to 1.
> PS: I am new to Scala and the code might look Java-ish but I will be happy to 
> modify the code review changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3378) Client blocks forever if SocketChannel connects instantly

2016-03-20 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-3378:

Reviewer: Jun Rao

> Client blocks forever if SocketChannel connects instantly
> -
>
> Key: KAFKA-3378
> URL: https://issues.apache.org/jira/browse/KAFKA-3378
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Larkin Lowrey
>Assignee: Larkin Lowrey
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Observed that some consumers were blocked in Fetcher.listOffset() when 
> starting many dozens of consumer threads at the same time.
> Selector.connect(...) calls SocketChannel.connect() in non-blocking mode and 
> assumes that false is always returned and that the channel will be in the 
> Selector's readyKeys once the connection is ready for connect completion due 
> to the OP_CONNECT interest op.
> When connect() returns true the channel is fully connected connected and will 
> not be included in readyKeys since only OP_CONNECT is set.
> I implemented a fix which handles the case when connect(...) returns true and 
> verified that I no longer see stuck consumers. A git pull request will be 
> forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3316) Add Connect REST API to list available connector classes

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200321#comment-15200321
 ] 

ASF GitHub Bot commented on KAFKA-3316:
---

GitHub user Ishiihara opened a pull request:

https://github.com/apache/kafka/pull/1090

[KAFKA-3316]: Add REST API for listing connector plugins



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ishiihara/kafka kafka-3316

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1090


commit 2071b6223532279be05928965112a99782675a20
Author: Liquan Pei 
Date:   2016-03-16T22:09:32Z

add list connector plugins




> Add Connect REST API to list available connector classes
> 
>
> Key: KAFKA-3316
> URL: https://issues.apache.org/jira/browse/KAFKA-3316
> Project: Kafka
>  Issue Type: Bug
>  Components: copycat
>Reporter: Ewen Cheslack-Postava
>Assignee: Liquan Pei
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Each worker process's REST API should have an endpoint that can list 
> available connector classes. This can use the same Reflections code as we 
> used in KAFKA-2422 to find matching connector classes based on a short name. 
> This is useful both for debugging and for any systems that want to work with 
> different connect clusters and be able to tell which clusters support which 
> connectors.
> We may need a new top-level resource to support this. We have /connectors 
> already, but that refers to instantiated connectors that have been named. In 
> contrast, this resource would refer to the connector classes 
> (uninstantiated). We might be able to use the same resource to, e.g., lookup 
> config info in KAFKA-3315 (which occurs before connector instantiation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3336) Unify ser/de pair classes into one serde class

2016-03-20 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3336:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1066
[https://github.com/apache/kafka/pull/1066]

> Unify ser/de pair classes into one serde class
> --
>
> Key: KAFKA-3336
> URL: https://issues.apache.org/jira/browse/KAFKA-3336
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Right now users must provide two separate classes for serializers and 
> deserializers, respectively.  This means the current API functions have at 
> least 2 * numberOfTypes parameters.
> *Example (current, bad): "foo(..., longSerializer, longDeserializer)".*
> Because the serde aspect of the API is already one of the biggest UX issues, 
> we should unify the serde functionality into a single serde class, i.e. one 
> class that provides both serialization and deserialization functionality.  
> This will reduce the number of required serde parameters in the API by 50%.
> *Example (suggested, better): "foo(..., longSerializerDeserializer)"*. 
> * Note: This parameter name is horrible and only used to highlight the 
> difference to the "current" example above.
> We also want to 1) add a pairing function for each operator that does not 
> require serialization and 2) add a default serde in the configs to make these 
> not required configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3397: use -1(latest) as time default val...

2016-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1068


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3153) Serializer/Deserializer Registration and Type inference

2016-03-20 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-3153.
--
Resolution: Won't Fix

> Serializer/Deserializer Registration and Type inference
> ---
>
> Key: KAFKA-3153
> URL: https://issues.apache.org/jira/browse/KAFKA-3153
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kafka streams
>Reporter: Yasuhiro Matsuda
>Assignee: Yasuhiro Matsuda
> Fix For: 0.10.0.0
>
>
> This changes the way serializer/deserializer are selected by the framework. 
> The new scheme requires the app dev to register serializers/deserializers for 
> types using API. The framework infers the type of data from topology and uses 
> appropriate serializer/deserializer. This is best effort. Type inference is 
> not always possible due to Java's type erasure. If a type cannot be 
> determined, a user code can supply more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: contributors list

2016-03-20 Thread Guozhang Wang
Eric,

What is your apache id?

Guozhang

On Tue, Mar 15, 2016 at 8:44 PM, Eric Wasserman 
wrote:

> Kafka Devs,
>
> I am working on a PR for https://issues.apache.org/jira/browse/KAFKA-1981
>  <
> https://issues.apache.org/jira/browse/KAFKA-1981 <
> https://issues.apache.org/jira/browse/KAFKA-1981>> and would like to be
> added to the Kafka contributors list so that I can assign the Jira issue to
> myself. I have already corresponded with Jay Kreps about the issue so I'm
> fairly optimistic that the PR will be on the right track.
>
> Regards,
>
> Eric Wasserman




-- 
-- Guozhang


Build failed in Jenkins: kafka-trunk-jdk7 #1129

2016-03-20 Thread Apache Jenkins Server
See 

Changes:

[cshapi] KAFKA-3328: SimpleAclAuthorizer can lose ACLs with frequent 
add/remov…

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H11 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision bfac36ad0e378b5f39e3889e40a75c5c1fc48fa7 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f bfac36ad0e378b5f39e3889e40a75c5c1fc48fa7
 > git rev-list eb823281a52f3b27c3a889e7412bc07b3024e688 # timeout=10
Setting GRADLE_2_4_RC_2_HOME=
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson8949507718812609937.sh
+ /bin/gradle
/tmp/hudson8949507718812609937.sh: line 2: /bin/gradle: No such file or 
directory
Build step 'Execute shell' marked build as failure
Recording test results
Setting GRADLE_2_4_RC_2_HOME=
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 2 days 7 hr old

Setting GRADLE_2_4_RC_2_HOME=
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51


[jira] [Commented] (KAFKA-3260) Increase the granularity of commit for SourceTask

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197502#comment-15197502
 ] 

ASF GitHub Bot commented on KAFKA-3260:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1080


> Increase the granularity of commit for SourceTask
> -
>
> Key: KAFKA-3260
> URL: https://issues.apache.org/jira/browse/KAFKA-3260
> Project: Kafka
>  Issue Type: Improvement
>  Components: copycat
>Affects Versions: 0.9.0.1
>Reporter: Jeremy Custenborder
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.10.0.0
>
>
> As of right now when commit is called the developer does not know which 
> messages have been accepted since the last poll. I'm proposing that we extend 
> the SourceTask class to allow records to be committed individually.
> {code}
> public void commitRecord(SourceRecord record) throws InterruptedException 
> {
> // This space intentionally left blank.
> }
> {code}
> This method could be overridden to receive a SourceRecord during the callback 
> of producer.send. This will give us messages that have been successfully 
> written to Kafka. The developer then has the capability to commit messages to 
> the source individually or in batch.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3409) Mirror maker hangs indefinitely due to commit

2016-03-20 Thread Richard Hillmann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201248#comment-15201248
 ] 

Richard Hillmann commented on KAFKA-3409:
-

I have an something similar exception, running with two MirrorMaker (These have 
their own consumer group). Both Cluster runs 0.9.1. 
Both stops working on the same time and freeze. Why does the mirrormaker not 
stop completeley, process is still running so is assume its still working what 
it does not. 
Correct behavior should be stop process and give feedback to the system that 
the service is not running anymore or retry to connect consumer. 

{code:java}
[2016-03-18 00:21:30,749] WARN Auto offset commit failed: null 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2016-03-18 00:21:31,034] ERROR Error ILLEGAL_GENERATION occurred while 
committing offsets for group mirror_aws_to_rz1 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2016-03-18 00:21:31,303] FATAL [mirrormaker-thread-0] Mirror maker thread 
failure due to  (kafka.tools.MirrorMaker$MirrorMakerThread)
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed due to group rebalance
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
at 
kafka.tools.MirrorMaker$MirrorMakerNewConsumer.commit(MirrorMaker.scala:548)
at kafka.tools.MirrorMaker$.commitOffsets(MirrorMaker.scala:340)
at 
kafka.tools.MirrorMaker$MirrorMakerThread.maybeFlushAndCommitOffsets(MirrorMaker.scala:438)
at kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:407)
[2016-03-18 00:21:31,725] ERROR Error ILLEGAL_GENERATION occurred while 
committing offsets for group mirror_aws_to_rz1 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
Exception in thread "mirrormaker-thread-0" 
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed due to group rebalance
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
at 

[jira] [Commented] (KAFKA-3415) AdminOperationException when altering Topic with same number of partitions

2016-03-20 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1510#comment-1510
 ] 

Edoardo Comar commented on KAFKA-3415:
--

As the user opening the JIRA stated, they use the scripts as part of 
automation. 
Having the script behave idempotently is actually simpler than checking the 
number of partitions and bumping up only if needed.

> AdminOperationException when altering Topic with same number of partitions
> --
>
> Key: KAFKA-3415
> URL: https://issues.apache.org/jira/browse/KAFKA-3415
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Affects Versions: 0.9.0.1
>Reporter: Gérald Quintana
>Priority: Minor
>
> To automate topic creation/modification, we sometimes run kafka-topics.sh 
> script with the same topic config. It raises an AdminOperationException, in 
> short it's idempotent
> {code}
> bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic logfailed
> Topic:logfailed PartitionCount:1ReplicationFactor:1 
> Configs:retention.ms=60480,retention.bytes=209715200
> Topic: logfailedPartition: 0Leader: 1   Replicas: 1   
>   Isr: 1
> bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic logfailed 
> --partitions 1 --config retention.bytes=209715200 --config 
> retention.ms=60480
> WARNING: Altering topic configuration from this script has been deprecated 
> and may be removed in future releases.
>  Going forward, please use kafka-configs.sh for this functionality
> Updated config for topic "logfailed".
> WARNING: If partitions are increased for a topic that has a key, the 
> partition logic or ordering of the messages will be affected
> Error while executing topic command : The number of partitions for a topic 
> can only be increased
> [2016-03-17 12:25:20,458] ERROR kafka.admin.AdminOperationException: The 
> number of partitions for a topic can only be increased
> at kafka.admin.AdminUtils$.addPartitions(AdminUtils.scala:119)
> at 
> kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:139)
> at 
> kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:116)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at kafka.admin.TopicCommand$.alterTopic(TopicCommand.scala:116)
> at kafka.admin.TopicCommand$.main(TopicCommand.scala:62)
> at kafka.admin.TopicCommand.main(TopicCommand.scala)
>  (kafka.admin.TopicCommand$)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #458

2016-03-20 Thread Apache Jenkins Server
See 

Changes:

[cshapi] KAFKA-3328: SimpleAclAuthorizer can lose ACLs with frequent 
add/remov…

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H11 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision bfac36ad0e378b5f39e3889e40a75c5c1fc48fa7 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f bfac36ad0e378b5f39e3889e40a75c5c1fc48fa7
 > git rev-list eb823281a52f3b27c3a889e7412bc07b3024e688 # timeout=10
Setting 
JDK1_8_0_45_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_45
Setting GRADLE_2_4_RC_2_HOME=
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson8584753154853914202.sh
+ /bin/gradle
/tmp/hudson8584753154853914202.sh: line 2: /bin/gradle: No such file or 
directory
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
JDK1_8_0_45_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_45
Setting GRADLE_2_4_RC_2_HOME=
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 1 day 4 hr old

Setting 
JDK1_8_0_45_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_45
Setting GRADLE_2_4_RC_2_HOME=


[jira] [Commented] (KAFKA-3328) SimpleAclAuthorizer can lose ACLs with frequent add/remove calls

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203146#comment-15203146
 ] 

ASF GitHub Bot commented on KAFKA-3328:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1006


> SimpleAclAuthorizer can lose ACLs with frequent add/remove calls
> 
>
> Key: KAFKA-3328
> URL: https://issues.apache.org/jira/browse/KAFKA-3328
> Project: Kafka
>  Issue Type: Bug
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> Currently when adding or removing an ACL with the SimpleAclAuthorizer the 
> following high level steps happen:
> # read acls from cache
> # merge with the changes acls
> # update zookeeper
> # add a change notification
> Then the Authorizers listening for the change notification know to invalidate 
> their cache and get the latest value. However that takes some time. In the 
> time between the ACL change and the cache update, a new add or remove request 
> could be made. This will follow the steps listed above, and if the cache is 
> not correct all changes from the previous request are lost.
> This can be solved on a single node, by updating the cache at the same time 
> you update zookeeper any time a change is made. However, because there can be 
> multiple instances of the Authorizer, a request could come to a separate 
> authorizer and overwrite the Zookeeper state again loosing changes from 
> earlier requests.
> To solve this on multiple instances. The authorizer could always read/write 
> state from zookeeper (instead of the cache) for add/remove requests and only 
> leverage the cache for get/authorize requests. Or it could block until all 
> the live instances have updated their cache. 
> Below is a log from a failed test in the WIP [pull 
> request|https://github.com/apache/kafka/pull/1005] for KAFKA-3266 that shows 
> this behavior:
> {noformat}
> [2016-03-03 11:09:20,714] DEBUG [KafkaApi-0] adding User:ANONYMOUS has Allow 
> permission for operations: Describe from hosts: * for Cluster:kafka-cluster 
> (kafka.server.KafkaApis:52)
> [2016-03-03 11:09:20,726] DEBUG updatedAcls: Set(User:ANONYMOUS has Allow 
> permission for operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,738] DEBUG [KafkaApi-0] adding User:ANONYMOUS has Deny 
> permission for operations: Describe from hosts: * for Cluster:kafka-cluster 
> (kafka.server.KafkaApis:52)
> [2016-03-03 11:09:20,739] DEBUG updatedAcls: Set(User:ANONYMOUS has Deny 
> permission for operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,752] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,755] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,762] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,768] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,773] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,777] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3328: SimpleAclAuthorizer can lose ACLs ...

2016-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1006


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3328) SimpleAclAuthorizer can lose ACLs with frequent add/remove calls

2016-03-20 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-3328:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1006
[https://github.com/apache/kafka/pull/1006]

> SimpleAclAuthorizer can lose ACLs with frequent add/remove calls
> 
>
> Key: KAFKA-3328
> URL: https://issues.apache.org/jira/browse/KAFKA-3328
> Project: Kafka
>  Issue Type: Bug
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> Currently when adding or removing an ACL with the SimpleAclAuthorizer the 
> following high level steps happen:
> # read acls from cache
> # merge with the changes acls
> # update zookeeper
> # add a change notification
> Then the Authorizers listening for the change notification know to invalidate 
> their cache and get the latest value. However that takes some time. In the 
> time between the ACL change and the cache update, a new add or remove request 
> could be made. This will follow the steps listed above, and if the cache is 
> not correct all changes from the previous request are lost.
> This can be solved on a single node, by updating the cache at the same time 
> you update zookeeper any time a change is made. However, because there can be 
> multiple instances of the Authorizer, a request could come to a separate 
> authorizer and overwrite the Zookeeper state again loosing changes from 
> earlier requests.
> To solve this on multiple instances. The authorizer could always read/write 
> state from zookeeper (instead of the cache) for add/remove requests and only 
> leverage the cache for get/authorize requests. Or it could block until all 
> the live instances have updated their cache. 
> Below is a log from a failed test in the WIP [pull 
> request|https://github.com/apache/kafka/pull/1005] for KAFKA-3266 that shows 
> this behavior:
> {noformat}
> [2016-03-03 11:09:20,714] DEBUG [KafkaApi-0] adding User:ANONYMOUS has Allow 
> permission for operations: Describe from hosts: * for Cluster:kafka-cluster 
> (kafka.server.KafkaApis:52)
> [2016-03-03 11:09:20,726] DEBUG updatedAcls: Set(User:ANONYMOUS has Allow 
> permission for operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,738] DEBUG [KafkaApi-0] adding User:ANONYMOUS has Deny 
> permission for operations: Describe from hosts: * for Cluster:kafka-cluster 
> (kafka.server.KafkaApis:52)
> [2016-03-03 11:09:20,739] DEBUG updatedAcls: Set(User:ANONYMOUS has Deny 
> permission for operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,752] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,755] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,762] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,768] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,773] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> [2016-03-03 11:09:20,777] DEBUG Processing ACL change notification for 
> Cluster:kafka-cluster and Set(User:ANONYMOUS has Deny permission for 
> operations: Describe from hosts: *) 
> (kafka.security.auth.SimpleAclAuthorizer:52)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3400) Topic stop working / can't describe topic

2016-03-20 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199839#comment-15199839
 ] 

Ashish K Singh commented on KAFKA-3400:
---

Could you provide some details on the error you see. Also could you describe 
topics with this patch when you see the error.

> Topic stop working / can't describe topic
> -
>
> Key: KAFKA-3400
> URL: https://issues.apache.org/jira/browse/KAFKA-3400
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Tobias
>Assignee: Ashish K Singh
> Fix For: 0.10.0.0
>
>
> we are seeing an issue were we intermittently (every couple of hours) get and 
> error with certain topics. They stop working and producers give a 
> LeaderNotFoundException.
> When we then try to use kafka-topics.sh to describe the topic we get the 
> error below.
> Error while executing topic command : next on empty iterator
> {{
> [2016-03-15 17:30:26,231] ERROR java.util.NoSuchElementException: next on 
> empty iterator
>   at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
>   at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
>   at scala.collection.IterableLike$class.head(IterableLike.scala:91)
>   at scala.collection.AbstractIterable.head(Iterable.scala:54)
>   at 
> kafka.admin.TopicCommand$$anonfun$describeTopic$1.apply(TopicCommand.scala:198)
>   at 
> kafka.admin.TopicCommand$$anonfun$describeTopic$1.apply(TopicCommand.scala:188)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at kafka.admin.TopicCommand$.describeTopic(TopicCommand.scala:188)
>   at kafka.admin.TopicCommand$.main(TopicCommand.scala:66)
>   at kafka.admin.TopicCommand.main(TopicCommand.scala)
>  (kafka.admin.TopicCommand$)
> }}
> if we delete the topic, then it will start to work again for a while
> We can't see anything obvious in the logs but are happy to provide if needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3188: Compatibility test for old and new...

2016-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1059


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3330) Truncate log cleaner offset checkpoint if the log is truncated

2016-03-20 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-3330:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1009
[https://github.com/apache/kafka/pull/1009]

> Truncate log cleaner offset checkpoint if the log is truncated
> --
>
> Key: KAFKA-3330
> URL: https://issues.apache.org/jira/browse/KAFKA-3330
> Project: Kafka
>  Issue Type: Bug
>Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> Were getting a number of failures of the log compaction thread with the
> following error:
> 2016/02/02 00:13:58.832 [LogCleaner] Cleaner 0: Beginning cleaning of log
> __consumer_offsets-93.
> 2016/02/02 00:13:58.832 [LogCleaner] Cleaner 0: Building offset map for
> __consumer_offsets-93...
> 2016/02/02 00:13:59.048 [LogCleaner] Cleaner 0: Building offset map for log
> __consumer_offsets-93 for 2 segments in offset range [11951210572,
> 11952632314).
> 2016/02/02 00:13:59.066 [LogCleaner] [kafka-log-cleaner-thread-0], Error
> due to
> java.lang.IllegalArgumentException: requirement failed: Last clean offset
> is 11951210572 but segment base offset is 11950300163 for log
> __consumer_offsets-93.
> at scala.Predef$.require(Predef.scala:233) ~[scala-library-2.10.4.jar:?]
> at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:561)
> ~[kafka_2.10-0.8.2.56.jar:?]
> at kafka.log.Cleaner.clean(LogCleaner.scala:306)
> ~[kafka_2.10-0.8.2.56.jar:?]
> at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:217)
> ~[kafka_2.10-0.8.2.56.jar:?]
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:195)
> ~[kafka_2.10-0.8.2.56.jar:?]
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> ~[kafka_2.10-0.8.2.56.jar:?]
> 2016/02/02 00:13:59.066 [LogCleaner] [kafka-log-cleaner-thread-0], Stopped
> We found that this may be caused in the following scenario:
> - we have three log segments with offset range [100, 200), [200, 300), and 
> [300, 400) respectively. 300 is the base offset of the active log segment. 
> Log cleaner offset checkpoint is also 300.
> - After log is truncated to offset 220, the log segments become [100, 200), 
> [200, 220). The Log cleaner offset checkpoint is still 300.
> - After new messages are appended to the log, the log segments become [100, 
> 200), [200, 320), [320, 420). The Log cleaner offset checkpoint is still 300.
> - Log cleaner cleans the log starting at offset 300. The require(offset == 
> start) in Cleaner.buildOffsetMap() fails because the the offset 300 is not 
> the base offset of any segments.
> To fix the problem, when the log is truncated to an offset smaller than 
> cleaner offset checkpoint, we should reset cleaner offset checkpoint to the 
> base offset of the active segment if this value is larger than the 
> checkpointed offset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: MINOR: Fix FetchRequest.getErrorResponse for v...

2016-03-20 Thread granthenke
GitHub user granthenke opened a pull request:

https://github.com/apache/kafka/pull/1091

MINOR: Fix FetchRequest.getErrorResponse for version 1



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/granthenke/kafka fetch-error

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1091


commit 82324dee7c0b27ddd1636b4fbc6f088808d57d57
Author: Grant Henke 
Date:   2016-03-17T21:07:42Z

MINOR: Fix FetchRequest.getErrorResponse for version 1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---