Re: Nagging - pending review requests :)

2015-05-29 Thread Joe Stein
Hey Jai, see below

On Fri, May 29, 2015 at 3:03 AM, Jaikiran Pai jai.forums2...@gmail.com
wrote:

 Hi Joe,

 Comments inline.

 On Friday 29 May 2015 12:15 PM, Joe Stein wrote:

 see below

 On Fri, May 29, 2015 at 2:25 AM, Jaikiran Pai jai.forums2...@gmail.com
 wrote:

  Could someone please look at these few review requests and let me know if
 any changes are needed:

 https://reviews.apache.org/r/34394/ related to
 https://issues.apache.org/jira/browse/KAFKA-1907


 I haven't looked at all the other changes that would be introduced from
 their release that could break between zk and kafka by introducing a zk
 client bump. A less ops negative way to deal with this might be to create
 a
 plugable interface, then someone can use a patched zkclient if they
 wanted,
 or exhibitor, or consul, or akka, etc.



 The ZkClient has already been bumped to this newer version as part of a
 separate task https://issues.apache.org/jira/browse/KAFKA-2169 and it's
 already in trunk. This change in my review request only passes along an
 (optional) value to the ZkClient constructor that was introduced in that
 newer version.


I left a comment in the review.





  https://reviews.apache.org/r/30403/ related to
 https://issues.apache.org/jira/browse/KAFKA-1906


 I don't understand the patch and how it would fix the issue. I also don't
 think necessarily there is an issue. Its a balance from the community
 having a good out of the box experience vs taking defaults and rushing
 them
 into production. No matter what we do we can't stop the latter from
 happening, which will also cause issues.


 The change to use a default directory that's within the Kafka installation
 path rather than /tmp folder (which get erased on restarts) is more from a
 development environment point of view rather than production. As you note,
 production environments will anyway have to deal with setting the right
 configs. From a developer perspective, I like the Kafka logs to survive
 system restarts when I'm working on applications which use Kafka. Of
 course, I can go ahead and change that default value in the
 server.properties on each fresh installation. But personally, I like it
 more if the logs are are stored within the Kafka installation itself so
 that even if I have multiple different versions of Kafka running (for
 different applications) on the same system, the logs are isolated to the
 Kafka installation and don't interfere with each other. We currently have a
 development setup where we have a bunch of VMs with different Kafka
 installations. These VMs are then handed out to developers to work on
 various different applications (which are under development). The first
 thing we currently do is edit the server.properties and update the log path
 (and that's the only change we do for dev). It would be much more easier
 and convenient/manageable if this log directory default to a path within
 the Kafka installation.


Developers like things to work when they try them out too. If there is
another way to have something other than /tmp be the default for log.dirs
and still run kinda everywhere folks want it too then lets discuss that as
a thread separately. If you have a proposal for what that is and how it
work you could submit it to
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals.
I think most developers that use Kafka use it because they have an eye to
production and they check and change things in the configs like the data
being saved to /tmp.  The relative dir is a tad scary especially when you
have log and kafka-logs which is which?

This will also be be _really_ confusing to people imho

-# A comma seperated list of directories under which to store log files
-log.dirs=/tmp/kafka-logs
+# A comma separated list of directories under which to store log files
+#log.dirs=






 There's also this one https://reviews.apache.org/r/34697/ for
 https://issues.apache.org/jira/browse/KAFKA-2221 but it's only been up
 since a couple of days and is a fairly minor one.


 Folks should start to transition in 0.8.3 to the new java consumer (which
 is on trunk). If this fix is so critical we should release it in 0.8.2.2
 otherwise continue to try to not make changes to the existing scalal
 consumer.


 Fair enough. It was more to help narrow down the real issues when a
 reconnect happens and isn't that critical. Do you want me to close that
 review request?


Your call. Folks may want to patch the change so knowing what version it is
for in the fix is helpful for them to-do that if they wanted. It is also
one less ticket to look at for folks.




 -Jaikiran



Re: Review Request 34394: Patch for KAFKA-1907

2015-05-29 Thread Jaikiran Pai


 On May 29, 2015, 7:11 a.m., Joe Stein wrote:
  core/src/main/scala/kafka/utils/ZkUtils.scala, line 39
  https://reviews.apache.org/r/34394/diff/2/?file=971310#file971310line39
 
  if we are going to add this it should be exposed as a configuration and 
  written up in a KIP. We can't hard code values that folks won't understand 
  without some clear information about why it is 5000

Ok. I'll read through the KIP process and create a new one.


- Jaikiran


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34394/#review85691
---


On May 25, 2015, 3:49 a.m., Jaikiran Pai wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34394/
 ---
 
 (Updated May 25, 2015, 3:49 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1907
 https://issues.apache.org/jira/browse/KAFKA-1907
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1907 Set operation retry timeout on ZkClient. Also mark certain Kafka 
 threads as daemon to allow proper JVM shutdown
 
 
 Diffs
 -
 
   clients/src/main/java/org/apache/kafka/common/utils/Utils.java 
 f73eedb030987f018d8446bb1dcd98d19fa97331 
   core/src/main/scala/kafka/network/SocketServer.scala 
 edf6214278935c031cf493d72d266e715d43dd06 
   core/src/main/scala/kafka/server/DelayedOperation.scala 
 123078d97a7bfe2121655c00f3b2c6af21c53015 
   core/src/main/scala/kafka/server/KafkaServer.scala 
 e66710d2368334ece66f70d55f57b3f888262620 
   core/src/main/scala/kafka/utils/ZkUtils.scala 
 78475e3d5ec477cef00caeaa34ff2d196466be96 
 
 Diff: https://reviews.apache.org/r/34394/diff/
 
 
 Testing
 ---
 
 ZkClient was recently upgraded to 0.5 version, as part of KAFKA-2169. The 0.5 
 version of ZkClient contains an enhancement which allows passing of operation 
 retry timeout https://github.com/sgroschupf/zkclient/pull/29. This now allows 
 us to fix the issue reported in 
 https://issues.apache.org/jira/browse/KAFKA-1907.
 
 The commit here passes the operation retry timeout while creating the 
 ZkClient instances. The commit was contains a change to mark certain threads 
 as daemon to allow a clean shutdown of the Kafka server when the zookeeper 
 instance has gone done first.
 
 I've locally tested that shutting down Kafka, after zookeeper has already 
 shutdown, works fine now (it tries to reconnect to zoookeeper for a maximum 
 of 5 seconds before cleanly shutting down). I've also checked that shutting 
 down Kafka first, when zookeeper is still up, works fine too.
 
 
 Thanks,
 
 Jaikiran Pai
 




RE: [VOTE] KIP-21 Dynamic Configuration

2015-05-29 Thread Aditya Auradkar
Yeah, the same cleaning mechanism will be carried over.

 1. Are we introducing a new Java API for the config change protocol and if
 so where will that appear? Is that going to be part of the java api in the
 admin api kip? Let's document that.
Yeah, we need to introduce a new Java API for the config change protocol. It 
should be a part of the AdminClient API. I'll alter KIP-4 to reflect that since 
the API is being introduced there.

 2. The proposed JSON format uses camel case for field names, is that what
 we've used for other JSON in zookeeper?
I think camel case is more appropriate for the JSON format. For example, under 
the brokers znode, I found jmx_port.

 3. This changes the format of the notifications, right? How will we
 grandfather in the old format? Clusters will have existing change
 notifications in the old format so I think the new code will need to be
 able to read those?
Interesting, I figured the existing notifications were purged by a cleaner 
thread frequently. In that case, we wouldn't need to grandfather any 
notifications since we would only need to not make any config changes for X 
minutes for there to be no changes in ZK. But the old notifications are 
actually removed when a new notification is received or the broker is bounced. 
So we do need to handle notifications in the old format. Should we simply 
ignore older changes since they are only valid for a short period of time?

Thanks,
Aditya

From: Jay Kreps [jay.kr...@gmail.com]
Sent: Thursday, May 28, 2015 5:25 PM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-21 Dynamic Configuration

That is handled now so I am assuming the same mechanism carries over?

-Jay

On Thu, May 28, 2015 at 5:12 PM, Guozhang Wang wangg...@gmail.com wrote:

 For the sequential config/changes/config_change_XX znode, do we have any
 manners to do cleaning in order to avoid the change-log from growing
 indefinitely?

 Guozhang

 On Thu, May 28, 2015 at 5:02 PM, Jay Kreps jay.kr...@gmail.com wrote:

  I still have a couple of questions:
  1. Are we introducing a new Java API for the config change protocol and
 if
  so where will that appear? Is that going to be part of the java api in
 the
  admin api kip? Let's document that.
  2. The proposed JSON format uses camel case for field names, is that what
  we've used for other JSON in zookeeper?
  3. This changes the format of the notifications, right? How will we
  grandfather in the old format? Clusters will have existing change
  notifications in the old format so I think the new code will need to be
  able to read those?
 
  -Jay
 
  On Thu, May 28, 2015 at 11:41 AM, Aditya Auradkar 
  aaurad...@linkedin.com.invalid wrote:
 
   bump
  
   
   From: Aditya Auradkar
   Sent: Tuesday, May 26, 2015 1:16 PM
   To: dev@kafka.apache.org
   Subject: RE: [VOTE] KIP-21 Dynamic Configuration
  
   Hey everyone,
  
   Completed the changes to KIP-4. After today's hangout, there doesn't
   appear to be anything remaining to discuss on this KIP.
   Please vote so we can formally close this.
  
   Thanks,
   Aditya
  
   
   From: Aditya Auradkar
   Sent: Thursday, May 21, 2015 11:26 AM
   To: dev@kafka.apache.org
   Subject: RE: [VOTE] KIP-21 Dynamic Configuration
  
   I think we should remove the config part in TopicMetadataResponse. It's
   probably cleaner if Alter and Describe are the only way to view and
  modify
   configs but I don't feel very strongly about it.
  
   Re-summarizing the proposed changes to KIP-4:
   - Change AlterTopic to not allow setting configs. Config changes will
  flow
   through AlterConfig. CreateTopic will still allow setting configs as it
  is
   nice to be able to specify configs while creating the topic.
   - TopicMetadataResponse shoudn't return config for the topic.
   DescribeConfig is the way to go.
   - Change InvalidTopicConfiguration error code to
 InvalidEntityConfig
   as proposed in KIP-21.
  
   Aditya
  
   
   From: Jun Rao [j...@confluent.io]
   Sent: Thursday, May 21, 2015 10:50 AM
   To: dev@kafka.apache.org
   Subject: Re: [VOTE] KIP-21 Dynamic Configuration
  
   What about TopicMetadataResponse in KIP-4? Do we remove the config part
  in
   it?
  
   Thanks,
  
   Jun
  
   On Thu, May 21, 2015 at 10:25 AM, Aditya Auradkar 
   aaurad...@linkedin.com.invalid wrote:
  
Hey Jun,
   
I've added a section on error codes on the KIP-21 wiki.
   
Here are the proposed changes to KIP-4. I'll update the wiki shortly.
- Change AlterTopic to not allow setting configs. Config changes will
   flow
through AlterConfig. CreateTopic will still allow setting configs as
 it
   is
nice to be able to specify configs while creating the topic.
- Change InvalidTopicConfiguration error code to
  InvalidEntityConfig
as proposed in KIP-21.
   
   
Thanks,
Aditya
   

Re: Review Request 34394: Patch for KAFKA-1907

2015-05-29 Thread Joe Stein

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34394/#review85691
---



core/src/main/scala/kafka/utils/ZkUtils.scala
https://reviews.apache.org/r/34394/#comment137408

if we are going to add this it should be exposed as a configuration and 
written up in a KIP. We can't hard code values that folks won't understand 
without some clear information about why it is 5000


- Joe Stein


On May 25, 2015, 3:49 a.m., Jaikiran Pai wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34394/
 ---
 
 (Updated May 25, 2015, 3:49 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1907
 https://issues.apache.org/jira/browse/KAFKA-1907
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1907 Set operation retry timeout on ZkClient. Also mark certain Kafka 
 threads as daemon to allow proper JVM shutdown
 
 
 Diffs
 -
 
   clients/src/main/java/org/apache/kafka/common/utils/Utils.java 
 f73eedb030987f018d8446bb1dcd98d19fa97331 
   core/src/main/scala/kafka/network/SocketServer.scala 
 edf6214278935c031cf493d72d266e715d43dd06 
   core/src/main/scala/kafka/server/DelayedOperation.scala 
 123078d97a7bfe2121655c00f3b2c6af21c53015 
   core/src/main/scala/kafka/server/KafkaServer.scala 
 e66710d2368334ece66f70d55f57b3f888262620 
   core/src/main/scala/kafka/utils/ZkUtils.scala 
 78475e3d5ec477cef00caeaa34ff2d196466be96 
 
 Diff: https://reviews.apache.org/r/34394/diff/
 
 
 Testing
 ---
 
 ZkClient was recently upgraded to 0.5 version, as part of KAFKA-2169. The 0.5 
 version of ZkClient contains an enhancement which allows passing of operation 
 retry timeout https://github.com/sgroschupf/zkclient/pull/29. This now allows 
 us to fix the issue reported in 
 https://issues.apache.org/jira/browse/KAFKA-1907.
 
 The commit here passes the operation retry timeout while creating the 
 ZkClient instances. The commit was contains a change to mark certain threads 
 as daemon to allow a clean shutdown of the Kafka server when the zookeeper 
 instance has gone done first.
 
 I've locally tested that shutting down Kafka, after zookeeper has already 
 shutdown, works fine now (it tries to reconnect to zoookeeper for a maximum 
 of 5 seconds before cleanly shutting down). I've also checked that shutting 
 down Kafka first, when zookeeper is still up, works fine too.
 
 
 Thanks,
 
 Jaikiran Pai
 




Re: Nagging - pending review requests :)

2015-05-29 Thread Jaikiran Pai

Hi Joe,

Comments inline.

On Friday 29 May 2015 12:15 PM, Joe Stein wrote:

see below

On Fri, May 29, 2015 at 2:25 AM, Jaikiran Pai jai.forums2...@gmail.com
wrote:


Could someone please look at these few review requests and let me know if
any changes are needed:

https://reviews.apache.org/r/34394/ related to
https://issues.apache.org/jira/browse/KAFKA-1907


I haven't looked at all the other changes that would be introduced from
their release that could break between zk and kafka by introducing a zk
client bump. A less ops negative way to deal with this might be to create a
plugable interface, then someone can use a patched zkclient if they wanted,
or exhibitor, or consul, or akka, etc.



The ZkClient has already been bumped to this newer version as part of a 
separate task https://issues.apache.org/jira/browse/KAFKA-2169 and it's 
already in trunk. This change in my review request only passes along an 
(optional) value to the ZkClient constructor that was introduced in that 
newer version.







https://reviews.apache.org/r/30403/ related to
https://issues.apache.org/jira/browse/KAFKA-1906


I don't understand the patch and how it would fix the issue. I also don't
think necessarily there is an issue. Its a balance from the community
having a good out of the box experience vs taking defaults and rushing them
into production. No matter what we do we can't stop the latter from
happening, which will also cause issues.


The change to use a default directory that's within the Kafka 
installation path rather than /tmp folder (which get erased on restarts) 
is more from a development environment point of view rather than 
production. As you note, production environments will anyway have to 
deal with setting the right configs. From a developer perspective, I 
like the Kafka logs to survive system restarts when I'm working on 
applications which use Kafka. Of course, I can go ahead and change that 
default value in the server.properties on each fresh installation. But 
personally, I like it more if the logs are are stored within the Kafka 
installation itself so that even if I have multiple different versions 
of Kafka running (for different applications) on the same system, the 
logs are isolated to the Kafka installation and don't interfere with 
each other. We currently have a development setup where we have a bunch 
of VMs with different Kafka installations. These VMs are then handed out 
to developers to work on various different applications (which are under 
development). The first thing we currently do is edit the 
server.properties and update the log path (and that's the only change we 
do for dev). It would be much more easier and convenient/manageable if 
this log directory default to a path within the Kafka installation.







There's also this one https://reviews.apache.org/r/34697/ for
https://issues.apache.org/jira/browse/KAFKA-2221 but it's only been up
since a couple of days and is a fairly minor one.


Folks should start to transition in 0.8.3 to the new java consumer (which
is on trunk). If this fix is so critical we should release it in 0.8.2.2
otherwise continue to try to not make changes to the existing scalal
consumer.



Fair enough. It was more to help narrow down the real issues when a 
reconnect happens and isn't that critical. Do you want me to close that 
review request?


-Jaikiran


RE: [VOTE] KIP-21 Dynamic Configuration

2015-05-29 Thread Aditya Auradkar
Minor edit: I meant that we should expect change notifications in the old 
format made earlier, but should perhaps ignore them. After the upgrade is done, 
older versions of AdminTools can no longer be used to make config changes.

Aditya


From: Aditya Auradkar
Sent: Thursday, May 28, 2015 11:22 PM
To: dev@kafka.apache.org
Subject: RE: [VOTE] KIP-21 Dynamic Configuration

Yeah, the same cleaning mechanism will be carried over.

 1. Are we introducing a new Java API for the config change protocol and if
 so where will that appear? Is that going to be part of the java api in the
 admin api kip? Let's document that.
Yeah, we need to introduce a new Java API for the config change protocol. It 
should be a part of the AdminClient API. I'll alter KIP-4 to reflect that since 
the API is being introduced there.

 2. The proposed JSON format uses camel case for field names, is that what
 we've used for other JSON in zookeeper?
I think camel case is more appropriate for the JSON format. For example, under 
the brokers znode, I found jmx_port.

 3. This changes the format of the notifications, right? How will we
 grandfather in the old format? Clusters will have existing change
 notifications in the old format so I think the new code will need to be
 able to read those?
Interesting, I figured the existing notifications were purged by a cleaner 
thread frequently. In that case, we wouldn't need to grandfather any 
notifications since we would only need to not make any config changes for X 
minutes for there to be no changes in ZK. But the old notifications are 
actually removed when a new notification is received or the broker is bounced. 
So we do need to handle notifications in the old format. Should we simply 
ignore older changes since they are only valid for a short period of time?

Thanks,
Aditya

From: Jay Kreps [jay.kr...@gmail.com]
Sent: Thursday, May 28, 2015 5:25 PM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-21 Dynamic Configuration

That is handled now so I am assuming the same mechanism carries over?

-Jay

On Thu, May 28, 2015 at 5:12 PM, Guozhang Wang wangg...@gmail.com wrote:

 For the sequential config/changes/config_change_XX znode, do we have any
 manners to do cleaning in order to avoid the change-log from growing
 indefinitely?

 Guozhang

 On Thu, May 28, 2015 at 5:02 PM, Jay Kreps jay.kr...@gmail.com wrote:

  I still have a couple of questions:
  1. Are we introducing a new Java API for the config change protocol and
 if
  so where will that appear? Is that going to be part of the java api in
 the
  admin api kip? Let's document that.
  2. The proposed JSON format uses camel case for field names, is that what
  we've used for other JSON in zookeeper?
  3. This changes the format of the notifications, right? How will we
  grandfather in the old format? Clusters will have existing change
  notifications in the old format so I think the new code will need to be
  able to read those?
 
  -Jay
 
  On Thu, May 28, 2015 at 11:41 AM, Aditya Auradkar 
  aaurad...@linkedin.com.invalid wrote:
 
   bump
  
   
   From: Aditya Auradkar
   Sent: Tuesday, May 26, 2015 1:16 PM
   To: dev@kafka.apache.org
   Subject: RE: [VOTE] KIP-21 Dynamic Configuration
  
   Hey everyone,
  
   Completed the changes to KIP-4. After today's hangout, there doesn't
   appear to be anything remaining to discuss on this KIP.
   Please vote so we can formally close this.
  
   Thanks,
   Aditya
  
   
   From: Aditya Auradkar
   Sent: Thursday, May 21, 2015 11:26 AM
   To: dev@kafka.apache.org
   Subject: RE: [VOTE] KIP-21 Dynamic Configuration
  
   I think we should remove the config part in TopicMetadataResponse. It's
   probably cleaner if Alter and Describe are the only way to view and
  modify
   configs but I don't feel very strongly about it.
  
   Re-summarizing the proposed changes to KIP-4:
   - Change AlterTopic to not allow setting configs. Config changes will
  flow
   through AlterConfig. CreateTopic will still allow setting configs as it
  is
   nice to be able to specify configs while creating the topic.
   - TopicMetadataResponse shoudn't return config for the topic.
   DescribeConfig is the way to go.
   - Change InvalidTopicConfiguration error code to
 InvalidEntityConfig
   as proposed in KIP-21.
  
   Aditya
  
   
   From: Jun Rao [j...@confluent.io]
   Sent: Thursday, May 21, 2015 10:50 AM
   To: dev@kafka.apache.org
   Subject: Re: [VOTE] KIP-21 Dynamic Configuration
  
   What about TopicMetadataResponse in KIP-4? Do we remove the config part
  in
   it?
  
   Thanks,
  
   Jun
  
   On Thu, May 21, 2015 at 10:25 AM, Aditya Auradkar 
   aaurad...@linkedin.com.invalid wrote:
  
Hey Jun,
   
I've added a section on error codes on the KIP-21 wiki.
   
Here are the proposed changes to KIP-4. 

[jira] [Commented] (KAFKA-188) Support multiple data directories

2015-05-29 Thread chenshangan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564610#comment-14564610
 ] 

chenshangan commented on KAFKA-188:
---

@Jay Kreps  I think we could provide a alternative, user can choose either one: 
Partitions determined or segments determined.

 Support multiple data directories
 -

 Key: KAFKA-188
 URL: https://issues.apache.org/jira/browse/KAFKA-188
 Project: Kafka
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.8.0

 Attachments: KAFKA-188-v2.patch, KAFKA-188-v3.patch, 
 KAFKA-188-v4.patch, KAFKA-188-v5.patch, KAFKA-188-v6.patch, 
 KAFKA-188-v7.patch, KAFKA-188-v8.patch, KAFKA-188.patch


 Currently we allow only a single data directory. This means that a multi-disk 
 configuration needs to be a RAID array or LVM volume or something like that 
 to be mounted as a single directory.
 For a high-throughput low-reliability configuration this would mean RAID0 
 striping. Common wisdom in Hadoop land has it that a JBOD setup that just 
 mounts each disk as a separate directory and does application-level balancing 
 over these results in about 30% write-improvement. For example see this claim 
 here:
   http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
 It is not clear to me why this would be the case--it seems the RAID 
 controller should be able to balance writes as well as the application so it 
 may depend on the details of the setup.
 Nonetheless this would be really easy to implement, all you need to do is add 
 multiple data directories and balance partition creation over these disks.
 One problem this might cause is if a particular topic is much larger than the 
 others it might unbalance the load across the disks. The partition-disk 
 assignment policy should probably attempt to evenly spread each topic to 
 avoid this, rather than just trying keep the number of partitions balanced 
 between disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 34805: Patch for KAFKA-2213

2015-05-29 Thread Manikumar Reddy O

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34805/
---

Review request for kafka.


Bugs: KAFKA-2213
https://issues.apache.org/jira/browse/KAFKA-2213


Repository: kafka


Description
---

Write the compacted messages using the configured broker compression type.


Diffs
-

  core/src/main/scala/kafka/log/Log.scala 
84e7b8fe9dd014884b60c4fbe13c835cf02a40e4 
  core/src/main/scala/kafka/log/LogCleaner.scala 
c9ade7208798fbd92d4ff49e183fe5f8925c82a9 
  core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 
471ddff9bff1bdfa277c071e59e5c6b749b9c74f 

Diff: https://reviews.apache.org/r/34805/diff/


Testing
---


Thanks,

Manikumar Reddy O



[jira] [Updated] (KAFKA-2213) Log cleaner should write compacted messages using configured compression type

2015-05-29 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-2213:
---
Assignee: Manikumar Reddy
  Status: Patch Available  (was: Open)

 Log cleaner should write compacted messages using configured compression type
 -

 Key: KAFKA-2213
 URL: https://issues.apache.org/jira/browse/KAFKA-2213
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Manikumar Reddy
 Attachments: KAFKA-2213.patch


 In KAFKA-1374 the log cleaner was improved to handle compressed messages. 
 There were a couple of follow-ups from that:
 * We write compacted messages using the original compression type in the 
 compressed message-set. We should instead append all retained messages with 
 the configured broker compression type of the topic.
 * While compressing messages we should ideally do some batching before 
 compression.
 * Investigate the use of the client compressor. (See the discussion in the 
 RBs for KAFKA-1374)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2213) Log cleaner should write compacted messages using configured compression type

2015-05-29 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564676#comment-14564676
 ] 

Manikumar Reddy commented on KAFKA-2213:


Uploaded a patch which writes the compacted messages using the configured 
broker compression type.  

a) If the log contains messages with multiple compression types and configured 
broker compression type is producer, then  will write the messages with 
latest  message compression type.  
b) No special batching is introduced. Currently on each iteration , we will try 
to compact  a maximum of maxMessageSize bytes. So the compacted message will be 
less than maxMessageSize byes
c) Updated the LogIntegrationTest to include broker compression. This may not 
be required, as it increases the test run time. 

I will investigate Compressor usage in next patch.


 Log cleaner should write compacted messages using configured compression type
 -

 Key: KAFKA-2213
 URL: https://issues.apache.org/jira/browse/KAFKA-2213
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Manikumar Reddy
 Attachments: KAFKA-2213.patch


 In KAFKA-1374 the log cleaner was improved to handle compressed messages. 
 There were a couple of follow-ups from that:
 * We write compacted messages using the original compression type in the 
 compressed message-set. We should instead append all retained messages with 
 the configured broker compression type of the topic.
 * While compressing messages we should ideally do some batching before 
 compression.
 * Investigate the use of the client compressor. (See the discussion in the 
 RBs for KAFKA-1374)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations (Thread 2)

2015-05-29 Thread Andrii Biletskyi
Guys,

I won't be able to attend next meeting. But in the latest patch for KIP-4
Phase 1
I didn't even evolve TopicMetadataRequest to v1 since we won't be able
to change config with AlterTopicRequest, hence with this patch TMR will
still
return isr. Taking this into account I think yes - it would be good to fix
ISR issue,
although I didn't consider it to be a critical one (isr was part of TMR
from the very
beginning and almost no code relies on this piece of request).

Thanks,
Andrii Biletskyi

On Fri, May 29, 2015 at 8:50 AM, Aditya Auradkar 
aaurad...@linkedin.com.invalid wrote:

 Thanks. Perhaps we should leave TMR unchanged for now. Should we discuss
 this during the next hangout?

 Aditya

 
 From: Jun Rao [j...@confluent.io]
 Sent: Thursday, May 28, 2015 5:32 PM
 To: dev@kafka.apache.org
 Subject: Re: [DISCUSS] KIP-4 - Command line and centralized administrative
 operations (Thread 2)

 There is a reasonable use case of ISR in KAFKA-2225. Basically, for
 economical reasons, we may want to let a consumer fetch from a replica in
 ISR that's in the same zone. In order to support that, it will be
 convenient to have TMR return the correct ISR for the consumer to choose.

 So, perhaps it's worth fixing the ISR inconsistency issue in KAFKA-1367
 (there is some new discussion there on what it takes to fix this). If we do
 that, we can leave TMR unchanged.

 Thanks,

 Jun

 On Tue, May 26, 2015 at 1:13 PM, Aditya Auradkar 
 aaurad...@linkedin.com.invalid wrote:

  Andryii,
 
  I made a few edits to this document as discussed in the KIP-21 thread.
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations
 
  With these changes. the only difference between TopicMetadataResponse_V1
  and V0 is the removal of the ISR field. I've altered the KIP with the
  assumption that this is a good enough reason by itself to evolve the
  request/response protocol. Any concerns there?
 
  Thanks,
  Aditya
 
  
  From: Mayuresh Gharat [gharatmayures...@gmail.com]
  Sent: Thursday, May 21, 2015 8:29 PM
  To: dev@kafka.apache.org
  Subject: Re: [DISCUSS] KIP-4 - Command line and centralized
 administrative
  operations (Thread 2)
 
  Hi Jun,
 
  Thanks a lot. I get it now.
   Point 4) will actually enable clients to who don't want to create a
 topic
  with default partitions, if it does not exist and then can manually
 create
  the topic with their own configs(#partitions).
 
  Thanks,
 
  Mayuresh
 
  On Thu, May 21, 2015 at 6:16 PM, Jun Rao j...@confluent.io wrote:
 
   Mayuresh,
  
   The current plan is the following.
  
   1. Add TMR v1, which still triggers auto topic creation.
   2. Change the consumer client to TMR v1. Change the producer client to
  use
   TMR v1 and on UnknownTopicException, issue TopicCreateRequest to
  explicitly
   create the topic with the default server side partitions and replicas.
   3. At some later time after the new clients are released and deployed,
   disable auto topic creation in TMR v1. This will make sure consumers
  never
   create new topics.
   4. If needed, we can add a new config in the producer to control
 whether
   TopicCreateRequest should be issued or not on UnknownTopicException. If
   this is disabled and the topic doesn't exist, send will fail and the
 user
   is expected to create the topic manually.
  
   Thanks,
  
   Jun
  
  
   On Thu, May 21, 2015 at 5:27 PM, Mayuresh Gharat 
   gharatmayures...@gmail.com
wrote:
  
Hi,
I had a question about TopicMetadata Request.
Currently the way it works is :
   
1) Suppose a topic T1 does not exist.
2) Client wants to produce data to T1 using producer P1.
3) Since T1 does not exist, P1 issues a TopicMetadata request to
 kafka.
This in turn creates the default number of partition. The number of
partitions is a cluster wide config.
4) Same goes for a consumer. If the topic does not exist and new
 topic
   will
be created when the consumer issues TopicMetadata request.
   
Here are 2 use cases where it might not be suited :
   
The auto creation flag for topics  is turned  ON.
   
a) Some clients might not want to create topic with default number of
partitions but with lower number of partitions. Currently in a
   multi-tenant
environment this is not possible without changing the cluster wide
   default
config.
   
b) Some clients might want to just check if the topic exist or not
 but
currently the topic gets created automatically using default number
 of
partitions.
   
Here are some ideas to address this :
   
1) The way this can be  addressed is that TopicMetadata request
 should
   have
a way to specify whether it should only check if the topic exist or
  check
and create a topic with given number of partitions. If the number of
partitions is not specified use the default cluster wide 

[jira] [Resolved] (KAFKA-2228) Delete me

2015-05-29 Thread Andrii Biletskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrii Biletskyi resolved KAFKA-2228.
-
Resolution: Duplicate

 Delete me
 -

 Key: KAFKA-2228
 URL: https://issues.apache.org/jira/browse/KAFKA-2228
 Project: Kafka
  Issue Type: Sub-task
Reporter: Andrii Biletskyi
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2227) Delete me

2015-05-29 Thread Andrii Biletskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrii Biletskyi resolved KAFKA-2227.
-
Resolution: Duplicate

 Delete me
 -

 Key: KAFKA-2227
 URL: https://issues.apache.org/jira/browse/KAFKA-2227
 Project: Kafka
  Issue Type: Sub-task
Reporter: Andrii Biletskyi
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2227) Delete me

2015-05-29 Thread Andrii Biletskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrii Biletskyi updated KAFKA-2227:

Summary: Delete me  (was: Phase 1: Requests and KafkaApis)

 Delete me
 -

 Key: KAFKA-2227
 URL: https://issues.apache.org/jira/browse/KAFKA-2227
 Project: Kafka
  Issue Type: Sub-task
Reporter: Andrii Biletskyi
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2228) Delete me

2015-05-29 Thread Andrii Biletskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrii Biletskyi reassigned KAFKA-2228:
---

Assignee: (was: Andrii Biletskyi)

 Delete me
 -

 Key: KAFKA-2228
 URL: https://issues.apache.org/jira/browse/KAFKA-2228
 Project: Kafka
  Issue Type: Sub-task
Reporter: Andrii Biletskyi
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2228) Delete me

2015-05-29 Thread Andrii Biletskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrii Biletskyi updated KAFKA-2228:

Summary: Delete me  (was: Phase 1: Requests and KafkaApis)

 Delete me
 -

 Key: KAFKA-2228
 URL: https://issues.apache.org/jira/browse/KAFKA-2228
 Project: Kafka
  Issue Type: Sub-task
Reporter: Andrii Biletskyi
Assignee: Andrii Biletskyi
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: Ignore gradle wrapper download directory

2015-05-29 Thread sslavic
GitHub user sslavic opened a pull request:

https://github.com/apache/kafka/pull/67

Ignore gradle wrapper download directory

This patch adds gradle wrapper download directory to .gitignore

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sslavic/kafka patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/67.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #67


commit cb43abda9fa60ad81d1d8df677a02e2f9a97a684
Author: Stevo Slavić ssla...@gmail.com
Date:   2015-05-29T14:03:34Z

Ignore gradle wrapper download directory




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations (Thread 2)

2015-05-29 Thread Ashish Singh
+1 on discussing this on next KIP hangout. I will update KIP-24 before that.

On Fri, May 29, 2015 at 3:40 AM, Andrii Biletskyi 
andrii.bilets...@stealth.ly wrote:

 Guys,

 I won't be able to attend next meeting. But in the latest patch for KIP-4
 Phase 1
 I didn't even evolve TopicMetadataRequest to v1 since we won't be able
 to change config with AlterTopicRequest, hence with this patch TMR will
 still
 return isr. Taking this into account I think yes - it would be good to fix
 ISR issue,
 although I didn't consider it to be a critical one (isr was part of TMR
 from the very
 beginning and almost no code relies on this piece of request).

 Thanks,
 Andrii Biletskyi

 On Fri, May 29, 2015 at 8:50 AM, Aditya Auradkar 
 aaurad...@linkedin.com.invalid wrote:

  Thanks. Perhaps we should leave TMR unchanged for now. Should we discuss
  this during the next hangout?
 
  Aditya
 
  
  From: Jun Rao [j...@confluent.io]
  Sent: Thursday, May 28, 2015 5:32 PM
  To: dev@kafka.apache.org
  Subject: Re: [DISCUSS] KIP-4 - Command line and centralized
 administrative
  operations (Thread 2)
 
  There is a reasonable use case of ISR in KAFKA-2225. Basically, for
  economical reasons, we may want to let a consumer fetch from a replica in
  ISR that's in the same zone. In order to support that, it will be
  convenient to have TMR return the correct ISR for the consumer to choose.
 
  So, perhaps it's worth fixing the ISR inconsistency issue in KAFKA-1367
  (there is some new discussion there on what it takes to fix this). If we
 do
  that, we can leave TMR unchanged.
 
  Thanks,
 
  Jun
 
  On Tue, May 26, 2015 at 1:13 PM, Aditya Auradkar 
  aaurad...@linkedin.com.invalid wrote:
 
   Andryii,
  
   I made a few edits to this document as discussed in the KIP-21 thread.
  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations
  
   With these changes. the only difference between
 TopicMetadataResponse_V1
   and V0 is the removal of the ISR field. I've altered the KIP with the
   assumption that this is a good enough reason by itself to evolve the
   request/response protocol. Any concerns there?
  
   Thanks,
   Aditya
  
   
   From: Mayuresh Gharat [gharatmayures...@gmail.com]
   Sent: Thursday, May 21, 2015 8:29 PM
   To: dev@kafka.apache.org
   Subject: Re: [DISCUSS] KIP-4 - Command line and centralized
  administrative
   operations (Thread 2)
  
   Hi Jun,
  
   Thanks a lot. I get it now.
Point 4) will actually enable clients to who don't want to create a
  topic
   with default partitions, if it does not exist and then can manually
  create
   the topic with their own configs(#partitions).
  
   Thanks,
  
   Mayuresh
  
   On Thu, May 21, 2015 at 6:16 PM, Jun Rao j...@confluent.io wrote:
  
Mayuresh,
   
The current plan is the following.
   
1. Add TMR v1, which still triggers auto topic creation.
2. Change the consumer client to TMR v1. Change the producer client
 to
   use
TMR v1 and on UnknownTopicException, issue TopicCreateRequest to
   explicitly
create the topic with the default server side partitions and
 replicas.
3. At some later time after the new clients are released and
 deployed,
disable auto topic creation in TMR v1. This will make sure consumers
   never
create new topics.
4. If needed, we can add a new config in the producer to control
  whether
TopicCreateRequest should be issued or not on UnknownTopicException.
 If
this is disabled and the topic doesn't exist, send will fail and the
  user
is expected to create the topic manually.
   
Thanks,
   
Jun
   
   
On Thu, May 21, 2015 at 5:27 PM, Mayuresh Gharat 
gharatmayures...@gmail.com
 wrote:
   
 Hi,
 I had a question about TopicMetadata Request.
 Currently the way it works is :

 1) Suppose a topic T1 does not exist.
 2) Client wants to produce data to T1 using producer P1.
 3) Since T1 does not exist, P1 issues a TopicMetadata request to
  kafka.
 This in turn creates the default number of partition. The number of
 partitions is a cluster wide config.
 4) Same goes for a consumer. If the topic does not exist and new
  topic
will
 be created when the consumer issues TopicMetadata request.

 Here are 2 use cases where it might not be suited :

 The auto creation flag for topics  is turned  ON.

 a) Some clients might not want to create topic with default number
 of
 partitions but with lower number of partitions. Currently in a
multi-tenant
 environment this is not possible without changing the cluster wide
default
 config.

 b) Some clients might want to just check if the topic exist or not
  but
 currently the topic gets created automatically using default number
  of
 partitions.

 Here are some ideas to 

[jira] [Comment Edited] (KAFKA-188) Support multiple data directories

2015-05-29 Thread chenshangan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564610#comment-14564610
 ] 

chenshangan edited comment on KAFKA-188 at 5/29/15 3:49 PM:


[~jkreps]  I think we could provide an alternative, user can choose either one: 
Partitions determined or segments determined.


was (Author: chenshangan...@163.com):
@Jay Kreps  I think we could provide a alternative, user can choose either one: 
Partitions determined or segments determined.

 Support multiple data directories
 -

 Key: KAFKA-188
 URL: https://issues.apache.org/jira/browse/KAFKA-188
 Project: Kafka
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.8.0

 Attachments: KAFKA-188-v2.patch, KAFKA-188-v3.patch, 
 KAFKA-188-v4.patch, KAFKA-188-v5.patch, KAFKA-188-v6.patch, 
 KAFKA-188-v7.patch, KAFKA-188-v8.patch, KAFKA-188.patch


 Currently we allow only a single data directory. This means that a multi-disk 
 configuration needs to be a RAID array or LVM volume or something like that 
 to be mounted as a single directory.
 For a high-throughput low-reliability configuration this would mean RAID0 
 striping. Common wisdom in Hadoop land has it that a JBOD setup that just 
 mounts each disk as a separate directory and does application-level balancing 
 over these results in about 30% write-improvement. For example see this claim 
 here:
   http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
 It is not clear to me why this would be the case--it seems the RAID 
 controller should be able to balance writes as well as the application so it 
 may depend on the details of the setup.
 Nonetheless this would be really easy to implement, all you need to do is add 
 multiple data directories and balance partition creation over these disks.
 One problem this might cause is if a particular topic is much larger than the 
 others it might unbalance the load across the disks. The partition-disk 
 assignment policy should probably attempt to evenly spread each topic to 
 avoid this, rather than just trying keep the number of partitions balanced 
 between disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2015-05-29 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564980#comment-14564980
 ] 

Ashish K Singh commented on KAFKA-1367:
---

[~junrao] can we add this to the agenda of next KIP hangout?

 Broker topic metadata not kept in sync with ZooKeeper
 -

 Key: KAFKA-1367
 URL: https://issues.apache.org/jira/browse/KAFKA-1367
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Ryan Berdeen
Assignee: Ashish K Singh
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1367.txt


 When a broker is restarted, the topic metadata responses from the brokers 
 will be incorrect (different from ZooKeeper) until a preferred replica leader 
 election.
 In the metadata, it looks like leaders are correctly removed from the ISR 
 when a broker disappears, but followers are not. Then, when a broker 
 reappears, the ISR is never updated.
 I used a variation of the Vagrant setup created by Joe Stein to reproduce 
 this with latest from the 0.8.1 branch: 
 https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34734: Patch for KAFKA-2226

2015-05-29 Thread Yasuhiro Matsuda


 On May 29, 2015, 1:56 a.m., Jun Rao wrote:
  core/src/main/scala/kafka/utils/timer/TimerTaskList.scala, line 106
  https://reviews.apache.org/r/34734/diff/2/?file=973959#file973959line106
 
  Since this is under synchronized, it seems that remove should always 
  return true?

Oh. You are right. I am not sure what I was thinking.


 On May 29, 2015, 1:56 a.m., Jun Rao wrote:
  core/src/main/scala/kafka/utils/timer/TimerTaskList.scala, line 153
  https://reviews.apache.org/r/34734/diff/2/?file=973959#file973959line153
 
  Not sure if I follow this comment.

I meant To cancel a task, it should be removed by calling cancel() to prevent 
it from reinsert.


 On May 29, 2015, 1:56 a.m., Jun Rao wrote:
  core/src/main/scala/kafka/utils/timer/TimerTaskList.scala, lines 136-137
  https://reviews.apache.org/r/34734/diff/2/?file=973959#file973959line136
 
  With this canceled flag, the logic is a bit more complicated since a 
  few other places need to check this flag. Not sure how much this helps in 
  reducing the probability of having a cancelled operation reinserted into 
  the list. Do you think it's worth doing this?

I believe that this will significantly eliminate canceled task being reinserted 
to a timing wheel or submitted to the task executor.


- Yasuhiro


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/#review85665
---


On May 29, 2015, 12:19 a.m., Yasuhiro Matsuda wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34734/
 ---
 
 (Updated May 29, 2015, 12:19 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2226
 https://issues.apache.org/jira/browse/KAFKA-2226
 
 
 Repository: kafka
 
 
 Description
 ---
 
 fix a race condition in TimerTaskEntry.remove
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/utils/timer/Timer.scala 
 b8cde820a770a4e894804f1c268b24b529940650 
   core/src/main/scala/kafka/utils/timer/TimerTask.scala 
 3407138115d579339ffb6b00e32e38c984ac5d6e 
   core/src/main/scala/kafka/utils/timer/TimerTaskList.scala 
 e7a96570ddc2367583d6d5590628db7e7f6ba39b 
   core/src/main/scala/kafka/utils/timer/TimingWheel.scala 
 e92aba3844dbf3372182e14536a5d98cf3366d73 
 
 Diff: https://reviews.apache.org/r/34734/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Yasuhiro Matsuda
 




Re: Review Request 34734: Patch for KAFKA-2226

2015-05-29 Thread Yasuhiro Matsuda

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/
---

(Updated May 29, 2015, 5:49 p.m.)


Review request for kafka.


Bugs: KAFKA-2226
https://issues.apache.org/jira/browse/KAFKA-2226


Repository: kafka


Description
---

fix a race condition in TimerTaskEntry.remove


Diffs (updated)
-

  core/src/main/scala/kafka/utils/timer/Timer.scala 
b8cde820a770a4e894804f1c268b24b529940650 
  core/src/main/scala/kafka/utils/timer/TimerTask.scala 
3407138115d579339ffb6b00e32e38c984ac5d6e 
  core/src/main/scala/kafka/utils/timer/TimerTaskList.scala 
e7a96570ddc2367583d6d5590628db7e7f6ba39b 
  core/src/main/scala/kafka/utils/timer/TimingWheel.scala 
e92aba3844dbf3372182e14536a5d98cf3366d73 

Diff: https://reviews.apache.org/r/34734/diff/


Testing
---


Thanks,

Yasuhiro Matsuda



[jira] [Comment Edited] (KAFKA-2213) Log cleaner should write compacted messages using configured compression type

2015-05-29 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564676#comment-14564676
 ] 

Manikumar Reddy edited comment on KAFKA-2213 at 5/29/15 6:58 PM:
-

Uploaded a patch which writes the compacted messages using the configured 
broker compression type.  

a) If the log contains messages with multiple compression types and configured 
broker compression type is producer, then  will write the messages with 
latest  message compression type.  
b) No special batching is introduced. Currently on each iteration , we will try 
to compact  a maximum of maxMessageSize bytes. So the compacted message will be 
less than maxMessageSize byes
c) Updated the LogIntegrationTest to include broker compression. This may not 
be required, as it increases the test run time. 
d) Used client MemoryRecords/Compressor classes



was (Author: omkreddy):
Uploaded a patch which writes the compacted messages using the configured 
broker compression type.  

a) If the log contains messages with multiple compression types and configured 
broker compression type is producer, then  will write the messages with 
latest  message compression type.  
b) No special batching is introduced. Currently on each iteration , we will try 
to compact  a maximum of maxMessageSize bytes. So the compacted message will be 
less than maxMessageSize byes
c) Updated the LogIntegrationTest to include broker compression. This may not 
be required, as it increases the test run time. 

I will investigate Compressor usage in next patch.


 Log cleaner should write compacted messages using configured compression type
 -

 Key: KAFKA-2213
 URL: https://issues.apache.org/jira/browse/KAFKA-2213
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Manikumar Reddy
 Attachments: KAFKA-2213.patch, KAFKA-2213_2015-05-30_00:23:01.patch


 In KAFKA-1374 the log cleaner was improved to handle compressed messages. 
 There were a couple of follow-ups from that:
 * We write compacted messages using the original compression type in the 
 compressed message-set. We should instead append all retained messages with 
 the configured broker compression type of the topic.
 * While compressing messages we should ideally do some batching before 
 compression.
 * Investigate the use of the client compressor. (See the discussion in the 
 RBs for KAFKA-1374)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[Bump] Code review for KIP-11

2015-05-29 Thread Parth Brahmbhatt
Hi,

Can someone please review the following CRs:

Public entities and interfaces with changes to KafkaAPI and KafkaServer: 
https://reviews.apache.org/r/34492/diff/
Actual Implementation: https://reviews.apache.org/r/34493/diff/
CLI: https://reviews.apache.org/r/34494/diff/

Thanks
Parth


[jira] [Updated] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance

2015-05-29 Thread Yasuhiro Matsuda (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuhiro Matsuda updated KAFKA-2226:

Attachment: KAFKA-2226_2015-05-29_10:49:34.patch

 NullPointerException in TestPurgatoryPerformance
 

 Key: KAFKA-2226
 URL: https://issues.apache.org/jira/browse/KAFKA-2226
 Project: Kafka
  Issue Type: Bug
Reporter: Onur Karaman
Assignee: Yasuhiro Matsuda
 Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, 
 KAFKA-2226_2015-05-29_10:49:34.patch


 A NullPointerException sometimes shows up in TimerTaskList.remove while 
 running TestPurgatoryPerformance. I’m on the HEAD of trunk.
 {code}
  ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 
  10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 
  --timeout 20
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to  
 (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1)
 java.lang.NullPointerException
   at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80)
   at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128)
   at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27)
   at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50)
   at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71)
   at 
 kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34734: Patch for KAFKA-2226

2015-05-29 Thread Yasuhiro Matsuda


 On May 28, 2015, 7:10 p.m., Jun Rao wrote:
  core/src/main/scala/kafka/utils/timer/TimerTaskList.scala, lines 64-65
  https://reviews.apache.org/r/34734/diff/1/?file=973063#file973063line64
 
  Could you explain a bit why this is needed? It seems that we can add 
  the entry either when it's created for the first time or when it's removed 
  from the current list and needs to be added to a new list during reinsert. 
  In both cases, the list in the entry will be null and there is no need to 
  remove the entry from the list.

I will remove this.


- Yasuhiro


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/#review85596
---


On May 29, 2015, 5:49 p.m., Yasuhiro Matsuda wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34734/
 ---
 
 (Updated May 29, 2015, 5:49 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2226
 https://issues.apache.org/jira/browse/KAFKA-2226
 
 
 Repository: kafka
 
 
 Description
 ---
 
 fix a race condition in TimerTaskEntry.remove
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/utils/timer/Timer.scala 
 b8cde820a770a4e894804f1c268b24b529940650 
   core/src/main/scala/kafka/utils/timer/TimerTask.scala 
 3407138115d579339ffb6b00e32e38c984ac5d6e 
   core/src/main/scala/kafka/utils/timer/TimerTaskList.scala 
 e7a96570ddc2367583d6d5590628db7e7f6ba39b 
   core/src/main/scala/kafka/utils/timer/TimingWheel.scala 
 e92aba3844dbf3372182e14536a5d98cf3366d73 
 
 Diff: https://reviews.apache.org/r/34734/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Yasuhiro Matsuda
 




[jira] [Commented] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance

2015-05-29 Thread Yasuhiro Matsuda (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565133#comment-14565133
 ] 

Yasuhiro Matsuda commented on KAFKA-2226:
-

Updated reviewboard https://reviews.apache.org/r/34734/diff/
 against branch origin/trunk

 NullPointerException in TestPurgatoryPerformance
 

 Key: KAFKA-2226
 URL: https://issues.apache.org/jira/browse/KAFKA-2226
 Project: Kafka
  Issue Type: Bug
Reporter: Onur Karaman
Assignee: Yasuhiro Matsuda
 Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, 
 KAFKA-2226_2015-05-29_10:49:34.patch


 A NullPointerException sometimes shows up in TimerTaskList.remove while 
 running TestPurgatoryPerformance. I’m on the HEAD of trunk.
 {code}
  ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 
  10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 
  --timeout 20
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to  
 (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1)
 java.lang.NullPointerException
   at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80)
   at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128)
   at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27)
   at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50)
   at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71)
   at 
 kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33196: Patch for KAFKA-2123

2015-05-29 Thread Ewen Cheslack-Postava

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33196/
---

(Updated May 29, 2015, 6:11 p.m.)


Review request for kafka.


Bugs: KAFKA-2123
https://issues.apache.org/jira/browse/KAFKA-2123


Repository: kafka


Description
---

KAFKA-2123: Add queuing of offset commit requests.


KAFKA-2123: Add scheduler for delayed tasks in new consumer, add backoff for 
commit retries, and simplify auto commit by using delayed tasks.


KAFKA-2123: Make synchronous offset commits wait for previous requests to 
finish in order.


KAFKA-2123: Remove redundant calls to ensureNotClosed


KAFKA-2123: Address review comments.


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
8f587bc0705b65b3ef37c86e0c25bb43ab8803de 
  
clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerCommitCallback.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
bdff518b732105823058e6182f445248b45dc388 
  clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
d301be4709f7b112e1f3a39f3c04cfa65f00fa60 
  clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
f50da825756938c193d7f07bee953e000e2627d9 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
 b2764df11afa7a99fce46d1ff48960d889032d14 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/DelayedTask.java
 PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/DelayedTaskQueue.java
 PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/common/errors/ConsumerCoordinatorNotAvailableException.java
 PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/common/errors/NotCoordinatorForConsumerException.java
 PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/common/errors/OffsetLoadInProgressException.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/common/protocol/Errors.java 
5b898c8f8ad5d0469f469b600c4b2eb13d1fc662 
  
clients/src/test/java/org/apache/kafka/clients/consumer/internals/CoordinatorTest.java
 b06c4a73e2b4e9472cd772c8bc32bf4a29f431bb 
  
clients/src/test/java/org/apache/kafka/clients/consumer/internals/DelayedTaskQueueTest.java
 PRE-CREATION 
  core/src/test/scala/integration/kafka/api/ConsumerTest.scala 
a1eed965a148eb19d9a6cefbfce131f58aaffc24 

Diff: https://reviews.apache.org/r/33196/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava



[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2015-05-29 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565078#comment-14565078
 ] 

Jun Rao commented on KAFKA-1367:


Yes.

 Broker topic metadata not kept in sync with ZooKeeper
 -

 Key: KAFKA-1367
 URL: https://issues.apache.org/jira/browse/KAFKA-1367
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Ryan Berdeen
Assignee: Ashish K Singh
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1367.txt


 When a broker is restarted, the topic metadata responses from the brokers 
 will be incorrect (different from ZooKeeper) until a preferred replica leader 
 election.
 In the metadata, it looks like leaders are correctly removed from the ISR 
 when a broker disappears, but followers are not. Then, when a broker 
 reappears, the ISR is never updated.
 I used a variation of the Vagrant setup created by Joe Stein to reproduce 
 this with latest from the 0.8.1 branch: 
 https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2199) Make signing artifacts optional, setting maven repository possible from command line

2015-05-29 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2199:
-
Attachment: KAFKA-2199_2015-05-29_11:00:44.patch

 Make signing artifacts optional, setting maven repository possible from 
 command line
 

 Key: KAFKA-2199
 URL: https://issues.apache.org/jira/browse/KAFKA-2199
 Project: Kafka
  Issue Type: Bug
  Components: build
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Minor
 Attachments: KAFKA-2199.patch, KAFKA-2199_2015-05-29_11:00:44.patch


 Currently it's annoying to work with snapshot builds if you want to install 
 them rather than just build  test. There are a couple of problems. First, if 
 you try to install (any of the upload* tasks), then you are required to sign 
 the artifacts with a GPG key. Second, the way the variables are defined in 
 gradle.properties seems to make it impossible to override them from the 
 command line. You're required to edit your ~/.gradle/gradle.properties file 
 (which is shared by all gradle projects).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34734: Patch for KAFKA-2226

2015-05-29 Thread Yasuhiro Matsuda


 On May 28, 2015, 7:10 p.m., Jun Rao wrote:
  core/src/main/scala/kafka/utils/timer/TimerTaskList.scala, lines 64-65
  https://reviews.apache.org/r/34734/diff/1/?file=973063#file973063line64
 
  Could you explain a bit why this is needed? It seems that we can add 
  the entry either when it's created for the first time or when it's removed 
  from the current list and needs to be added to a new list during reinsert. 
  In both cases, the list in the entry will be null and there is no need to 
  remove the entry from the list.
 
 Yasuhiro Matsuda wrote:
 I will remove this.

On second thought, I will leave this because this doesn't harm, and this 
ensures the consistency of the data structure without depending on callers to 
do the right thing.


- Yasuhiro


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/#review85596
---


On May 29, 2015, 5:49 p.m., Yasuhiro Matsuda wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34734/
 ---
 
 (Updated May 29, 2015, 5:49 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2226
 https://issues.apache.org/jira/browse/KAFKA-2226
 
 
 Repository: kafka
 
 
 Description
 ---
 
 fix a race condition in TimerTaskEntry.remove
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/utils/timer/Timer.scala 
 b8cde820a770a4e894804f1c268b24b529940650 
   core/src/main/scala/kafka/utils/timer/TimerTask.scala 
 3407138115d579339ffb6b00e32e38c984ac5d6e 
   core/src/main/scala/kafka/utils/timer/TimerTaskList.scala 
 e7a96570ddc2367583d6d5590628db7e7f6ba39b 
   core/src/main/scala/kafka/utils/timer/TimingWheel.scala 
 e92aba3844dbf3372182e14536a5d98cf3366d73 
 
 Diff: https://reviews.apache.org/r/34734/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Yasuhiro Matsuda
 




[jira] [Commented] (KAFKA-2199) Make signing artifacts optional, setting maven repository possible from command line

2015-05-29 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565155#comment-14565155
 ] 

Ewen Cheslack-Postava commented on KAFKA-2199:
--

Updated reviewboard https://reviews.apache.org/r/34369/diff/
 against branch origin/trunk

 Make signing artifacts optional, setting maven repository possible from 
 command line
 

 Key: KAFKA-2199
 URL: https://issues.apache.org/jira/browse/KAFKA-2199
 Project: Kafka
  Issue Type: Bug
  Components: build
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Minor
 Attachments: KAFKA-2199.patch, KAFKA-2199_2015-05-29_11:00:44.patch


 Currently it's annoying to work with snapshot builds if you want to install 
 them rather than just build  test. There are a couple of problems. First, if 
 you try to install (any of the upload* tasks), then you are required to sign 
 the artifacts with a GPG key. Second, the way the variables are defined in 
 gradle.properties seems to make it impossible to override them from the 
 command line. You're required to edit your ~/.gradle/gradle.properties file 
 (which is shared by all gradle projects).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34369: Patch for KAFKA-2199

2015-05-29 Thread Ewen Cheslack-Postava

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34369/
---

(Updated May 29, 2015, 6 p.m.)


Review request for kafka.


Bugs: KAFKA-2199
https://issues.apache.org/jira/browse/KAFKA-2199


Repository: kafka


Description
---

KAFKA-2199: Make signing artifacts optional and disabled by default for 
SNAPSHOTs and allow remote Maven repository configuration from the command line.


Diffs (updated)
-

  README.md 946ec62cc71df93c905c5f35caf5cdb9c78e5c10 
  build.gradle 3dca28eee55e04d4349fbada2079c64b0f1ef6a2 
  gradle.properties 90b1945372e767b9c2d0a50df9eb7063e0629952 

Diff: https://reviews.apache.org/r/34369/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava



Re: [Bump] Code review for KIP-11

2015-05-29 Thread Jun Rao
Parth,

I will take a look.

Thanks,

Jun

On Fri, May 29, 2015 at 10:49 AM, Parth Brahmbhatt 
pbrahmbh...@hortonworks.com wrote:

 Hi,

 Can someone please review the following CRs:

 Public entities and interfaces with changes to KafkaAPI and KafkaServer:
 https://reviews.apache.org/r/34492/diff/
 Actual Implementation: https://reviews.apache.org/r/34493/diff/
 CLI: https://reviews.apache.org/r/34494/diff/

 Thanks
 Parth



[jira] [Commented] (KAFKA-188) Support multiple data directories

2015-05-29 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565181#comment-14565181
 ] 

Jay Kreps commented on KAFKA-188:
-

@chenshangan The issue with using data size was that it is very very common to 
create a bunch of topics as once. When you do this all new partitions will be 
put on the same least full partition. Then when data starts being written that 
partition will be totally overloaded.

We can make this configurable, but I think almost anyone who chooses that 
option will get bit by it.

I recommend we instead leave this as it is for initial placement and implement 
rebalancing option that actively migrates partitions to balance data between 
directories. This is harder to implement but I think it is what you actually 
want.

 Support multiple data directories
 -

 Key: KAFKA-188
 URL: https://issues.apache.org/jira/browse/KAFKA-188
 Project: Kafka
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.8.0

 Attachments: KAFKA-188-v2.patch, KAFKA-188-v3.patch, 
 KAFKA-188-v4.patch, KAFKA-188-v5.patch, KAFKA-188-v6.patch, 
 KAFKA-188-v7.patch, KAFKA-188-v8.patch, KAFKA-188.patch


 Currently we allow only a single data directory. This means that a multi-disk 
 configuration needs to be a RAID array or LVM volume or something like that 
 to be mounted as a single directory.
 For a high-throughput low-reliability configuration this would mean RAID0 
 striping. Common wisdom in Hadoop land has it that a JBOD setup that just 
 mounts each disk as a separate directory and does application-level balancing 
 over these results in about 30% write-improvement. For example see this claim 
 here:
   http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
 It is not clear to me why this would be the case--it seems the RAID 
 controller should be able to balance writes as well as the application so it 
 may depend on the details of the setup.
 Nonetheless this would be really easy to implement, all you need to do is add 
 multiple data directories and balance partition creation over these disks.
 One problem this might cause is if a particular topic is much larger than the 
 others it might unbalance the load across the disks. The partition-disk 
 assignment policy should probably attempt to evenly spread each topic to 
 avoid this, rather than just trying keep the number of partitions balanced 
 between disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2123) Make new consumer offset commit API use callback + future

2015-05-29 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565164#comment-14565164
 ] 

Ewen Cheslack-Postava commented on KAFKA-2123:
--

Updated reviewboard https://reviews.apache.org/r/33196/diff/
 against branch origin/trunk

 Make new consumer offset commit API use callback + future
 -

 Key: KAFKA-2123
 URL: https://issues.apache.org/jira/browse/KAFKA-2123
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
 Fix For: 0.8.3

 Attachments: KAFKA-2123.patch, KAFKA-2123_2015-04-30_11:23:05.patch, 
 KAFKA-2123_2015-05-01_19:33:19.patch, KAFKA-2123_2015-05-04_09:39:50.patch, 
 KAFKA-2123_2015-05-04_22:51:48.patch, KAFKA-2123_2015-05-29_11:11:05.patch


 The current version of the offset commit API in the new consumer is
 void commit(offsets, commit type)
 where the commit type is either sync or async. This means you need to use 
 sync if you ever want confirmation that the commit succeeded. Some 
 applications will want to use asynchronous offset commit, but be able to tell 
 when the commit completes.
 This is basically the same problem that had to be fixed going from old 
 consumer - new consumer and I'd suggest the same fix using a callback + 
 future combination. The new API would be
 FutureVoid commit(MapTopicPartition, Long offsets, ConsumerCommitCallback 
 callback);
 where ConsumerCommitCallback contains a single method:
 public void onCompletion(Exception exception);
 We can provide shorthand variants of commit() for eliding the different 
 arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2123) Make new consumer offset commit API use callback + future

2015-05-29 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2123:
-
Attachment: KAFKA-2123_2015-05-29_11:11:05.patch

 Make new consumer offset commit API use callback + future
 -

 Key: KAFKA-2123
 URL: https://issues.apache.org/jira/browse/KAFKA-2123
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
 Fix For: 0.8.3

 Attachments: KAFKA-2123.patch, KAFKA-2123_2015-04-30_11:23:05.patch, 
 KAFKA-2123_2015-05-01_19:33:19.patch, KAFKA-2123_2015-05-04_09:39:50.patch, 
 KAFKA-2123_2015-05-04_22:51:48.patch, KAFKA-2123_2015-05-29_11:11:05.patch


 The current version of the offset commit API in the new consumer is
 void commit(offsets, commit type)
 where the commit type is either sync or async. This means you need to use 
 sync if you ever want confirmation that the commit succeeded. Some 
 applications will want to use asynchronous offset commit, but be able to tell 
 when the commit completes.
 This is basically the same problem that had to be fixed going from old 
 consumer - new consumer and I'd suggest the same fix using a callback + 
 future combination. The new API would be
 FutureVoid commit(MapTopicPartition, Long offsets, ConsumerCommitCallback 
 callback);
 where ConsumerCommitCallback contains a single method:
 public void onCompletion(Exception exception);
 We can provide shorthand variants of commit() for eliding the different 
 arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance

2015-05-29 Thread Yasuhiro Matsuda (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565522#comment-14565522
 ] 

Yasuhiro Matsuda commented on KAFKA-2226:
-

Updated reviewboard https://reviews.apache.org/r/34734/diff/
 against branch origin/trunk

 NullPointerException in TestPurgatoryPerformance
 

 Key: KAFKA-2226
 URL: https://issues.apache.org/jira/browse/KAFKA-2226
 Project: Kafka
  Issue Type: Bug
Reporter: Onur Karaman
Assignee: Yasuhiro Matsuda
 Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, 
 KAFKA-2226_2015-05-29_10:49:34.patch, KAFKA-2226_2015-05-29_15:04:35.patch


 A NullPointerException sometimes shows up in TimerTaskList.remove while 
 running TestPurgatoryPerformance. I’m on the HEAD of trunk.
 {code}
  ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 
  10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 
  --timeout 20
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to  
 (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1)
 java.lang.NullPointerException
   at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80)
   at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128)
   at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27)
   at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50)
   at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71)
   at 
 kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-188) Support multiple data directories

2015-05-29 Thread chenshangan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565723#comment-14565723
 ] 

chenshangan commented on KAFKA-188:
---

[~jkreps] 
I recommend we instead leave this as it is for initial placement and implement 
rebalancing option that actively migrates partitions to balance data between 
directories. This is harder to implement but I think it is what you actually 
want.

Exactly, this is what I really want, but it's pretty hard to implement. And in 
our use case, we seldom create a bunch of topics at the same time, topics are 
increasing day by day.

Common use case:
1. a new kafka cluster setup, lots of topics from other kafka cluster or system 
dump data into this new cluster. segments determined policy works well as all 
topics are started from zero, so segments are consistent with partitions.

2. an existing  kafka cluster, topics are added day by day. This is the ideal 
case, segments policy will work well. 

3. an existing kafka cluster, topics are added in bunch. It might cause all new 
topics being put on the same least directory, of course it will cause bad 
consequence. But if the cluster is big enough and disk counts and capacity of a 
broker is big enough, and this is not a common use case, the consequence will 
not be so serious. Users use this option should consider how to avoid such 
situation.

Above all, it's worthy providing such an option. But If we can implement a 
rebalancing option, it would be perfect.   


 Support multiple data directories
 -

 Key: KAFKA-188
 URL: https://issues.apache.org/jira/browse/KAFKA-188
 Project: Kafka
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.8.0

 Attachments: KAFKA-188-v2.patch, KAFKA-188-v3.patch, 
 KAFKA-188-v4.patch, KAFKA-188-v5.patch, KAFKA-188-v6.patch, 
 KAFKA-188-v7.patch, KAFKA-188-v8.patch, KAFKA-188.patch


 Currently we allow only a single data directory. This means that a multi-disk 
 configuration needs to be a RAID array or LVM volume or something like that 
 to be mounted as a single directory.
 For a high-throughput low-reliability configuration this would mean RAID0 
 striping. Common wisdom in Hadoop land has it that a JBOD setup that just 
 mounts each disk as a separate directory and does application-level balancing 
 over these results in about 30% write-improvement. For example see this claim 
 here:
   http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
 It is not clear to me why this would be the case--it seems the RAID 
 controller should be able to balance writes as well as the application so it 
 may depend on the details of the setup.
 Nonetheless this would be really easy to implement, all you need to do is add 
 multiple data directories and balance partition creation over these disks.
 One problem this might cause is if a particular topic is much larger than the 
 others it might unbalance the load across the disks. The partition-disk 
 assignment policy should probably attempt to evenly spread each topic to 
 avoid this, rather than just trying keep the number of partitions balanced 
 between disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2199) Make signing artifacts optional, setting maven repository possible from command line

2015-05-29 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-2199:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed.

 Make signing artifacts optional, setting maven repository possible from 
 command line
 

 Key: KAFKA-2199
 URL: https://issues.apache.org/jira/browse/KAFKA-2199
 Project: Kafka
  Issue Type: Bug
  Components: build
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Minor
 Attachments: KAFKA-2199.patch, KAFKA-2199_2015-05-29_11:00:44.patch


 Currently it's annoying to work with snapshot builds if you want to install 
 them rather than just build  test. There are a couple of problems. First, if 
 you try to install (any of the upload* tasks), then you are required to sign 
 the artifacts with a GPG key. Second, the way the variables are defined in 
 gradle.properties seems to make it impossible to override them from the 
 command line. You're required to edit your ~/.gradle/gradle.properties file 
 (which is shared by all gradle projects).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33196: Patch for KAFKA-2123

2015-05-29 Thread Jay Kreps

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33196/#review85824
---



clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java
https://reviews.apache.org/r/33196/#comment137644

What would be the use case for
   commit(CommitType.SYNC, mycallback)?
   
That is, if the commit is synchronous can't you always just do your stuff 
when it returns?



clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java
https://reviews.apache.org/r/33196/#comment137645

There are a bunch of places we need to possibly retry. Does it make sense 
to configure these seperately or just have a bulk retries config? (I'm not sure 
what my opinion is).



clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java
https://reviews.apache.org/r/33196/#comment137646

I think this comment is now a little out of date as this block just 
initiates.



clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java
https://reviews.apache.org/r/33196/#comment137647

Doesn't this loop until the timeout expires? The prior logic polled until 
either a record arrived or the timeout expired which I think is what we want, 
but I may be misunderstanding.



clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
https://reviews.apache.org/r/33196/#comment137648

Are offset commits the only thing we need/want to queue this way? I wonder 
if there isn't a kind of BlockingClient that does this for youpresumably 
would be good for admin too.



clients/src/main/java/org/apache/kafka/common/errors/OffsetLoadInProgressException.java
https://reviews.apache.org/r/33196/#comment137651

Are these new exceptions getting thrown back to the user now?


- Jay Kreps


On May 29, 2015, 6:11 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33196/
 ---
 
 (Updated May 29, 2015, 6:11 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2123
 https://issues.apache.org/jira/browse/KAFKA-2123
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-2123: Add queuing of offset commit requests.
 
 
 KAFKA-2123: Add scheduler for delayed tasks in new consumer, add backoff for 
 commit retries, and simplify auto commit by using delayed tasks.
 
 
 KAFKA-2123: Make synchronous offset commits wait for previous requests to 
 finish in order.
 
 
 KAFKA-2123: Remove redundant calls to ensureNotClosed
 
 
 KAFKA-2123: Address review comments.
 
 
 Diffs
 -
 
   clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
 8f587bc0705b65b3ef37c86e0c25bb43ab8803de 
   
 clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerCommitCallback.java
  PRE-CREATION 
   clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
 bdff518b732105823058e6182f445248b45dc388 
   clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
 d301be4709f7b112e1f3a39f3c04cfa65f00fa60 
   clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
 f50da825756938c193d7f07bee953e000e2627d9 
   
 clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
  b2764df11afa7a99fce46d1ff48960d889032d14 
   
 clients/src/main/java/org/apache/kafka/clients/consumer/internals/DelayedTask.java
  PRE-CREATION 
   
 clients/src/main/java/org/apache/kafka/clients/consumer/internals/DelayedTaskQueue.java
  PRE-CREATION 
   
 clients/src/main/java/org/apache/kafka/common/errors/ConsumerCoordinatorNotAvailableException.java
  PRE-CREATION 
   
 clients/src/main/java/org/apache/kafka/common/errors/NotCoordinatorForConsumerException.java
  PRE-CREATION 
   
 clients/src/main/java/org/apache/kafka/common/errors/OffsetLoadInProgressException.java
  PRE-CREATION 
   clients/src/main/java/org/apache/kafka/common/protocol/Errors.java 
 5b898c8f8ad5d0469f469b600c4b2eb13d1fc662 
   
 clients/src/test/java/org/apache/kafka/clients/consumer/internals/CoordinatorTest.java
  b06c4a73e2b4e9472cd772c8bc32bf4a29f431bb 
   
 clients/src/test/java/org/apache/kafka/clients/consumer/internals/DelayedTaskQueueTest.java
  PRE-CREATION 
   core/src/test/scala/integration/kafka/api/ConsumerTest.scala 
 a1eed965a148eb19d9a6cefbfce131f58aaffc24 
 
 Diff: https://reviews.apache.org/r/33196/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava
 




Re: Review Request 34369: Patch for KAFKA-2199

2015-05-29 Thread Jay Kreps

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34369/#review85823
---

Ship it!


Ship It!

- Jay Kreps


On May 29, 2015, 6 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34369/
 ---
 
 (Updated May 29, 2015, 6 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2199
 https://issues.apache.org/jira/browse/KAFKA-2199
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-2199: Make signing artifacts optional and disabled by default for 
 SNAPSHOTs and allow remote Maven repository configuration from the command 
 line.
 
 
 Diffs
 -
 
   README.md 946ec62cc71df93c905c5f35caf5cdb9c78e5c10 
   build.gradle 3dca28eee55e04d4349fbada2079c64b0f1ef6a2 
   gradle.properties 90b1945372e767b9c2d0a50df9eb7063e0629952 
 
 Diff: https://reviews.apache.org/r/34369/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava
 




Re: Review Request 34734: Patch for KAFKA-2226

2015-05-29 Thread Yasuhiro Matsuda

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/
---

(Updated May 29, 2015, 10:10 p.m.)


Review request for kafka.


Bugs: KAFKA-2226
https://issues.apache.org/jira/browse/KAFKA-2226


Repository: kafka


Description
---

fix a race condition in TimerTaskEntry.remove


Diffs (updated)
-

  core/src/main/scala/kafka/utils/timer/Timer.scala 
b8cde820a770a4e894804f1c268b24b529940650 
  core/src/main/scala/kafka/utils/timer/TimerTask.scala 
3407138115d579339ffb6b00e32e38c984ac5d6e 
  core/src/main/scala/kafka/utils/timer/TimerTaskList.scala 
e7a96570ddc2367583d6d5590628db7e7f6ba39b 
  core/src/main/scala/kafka/utils/timer/TimingWheel.scala 
e92aba3844dbf3372182e14536a5d98cf3366d73 

Diff: https://reviews.apache.org/r/34734/diff/


Testing
---


Thanks,

Yasuhiro Matsuda



Re: Review Request 34734: Patch for KAFKA-2226

2015-05-29 Thread Yasuhiro Matsuda

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/
---

(Updated May 29, 2015, 10:04 p.m.)


Review request for kafka.


Bugs: KAFKA-2226
https://issues.apache.org/jira/browse/KAFKA-2226


Repository: kafka


Description
---

fix a race condition in TimerTaskEntry.remove


Diffs (updated)
-

  core/src/main/scala/kafka/utils/timer/Timer.scala 
b8cde820a770a4e894804f1c268b24b529940650 
  core/src/main/scala/kafka/utils/timer/TimerTask.scala 
3407138115d579339ffb6b00e32e38c984ac5d6e 
  core/src/main/scala/kafka/utils/timer/TimerTaskList.scala 
e7a96570ddc2367583d6d5590628db7e7f6ba39b 
  core/src/main/scala/kafka/utils/timer/TimingWheel.scala 
e92aba3844dbf3372182e14536a5d98cf3366d73 

Diff: https://reviews.apache.org/r/34734/diff/


Testing
---


Thanks,

Yasuhiro Matsuda



[jira] [Updated] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance

2015-05-29 Thread Yasuhiro Matsuda (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuhiro Matsuda updated KAFKA-2226:

Attachment: KAFKA-2226_2015-05-29_15:04:35.patch

 NullPointerException in TestPurgatoryPerformance
 

 Key: KAFKA-2226
 URL: https://issues.apache.org/jira/browse/KAFKA-2226
 Project: Kafka
  Issue Type: Bug
Reporter: Onur Karaman
Assignee: Yasuhiro Matsuda
 Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, 
 KAFKA-2226_2015-05-29_10:49:34.patch, KAFKA-2226_2015-05-29_15:04:35.patch


 A NullPointerException sometimes shows up in TimerTaskList.remove while 
 running TestPurgatoryPerformance. I’m on the HEAD of trunk.
 {code}
  ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 
  10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 
  --timeout 20
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to  
 (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1)
 java.lang.NullPointerException
   at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80)
   at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128)
   at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27)
   at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50)
   at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71)
   at 
 kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance

2015-05-29 Thread Yasuhiro Matsuda (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565538#comment-14565538
 ] 

Yasuhiro Matsuda commented on KAFKA-2226:
-

Updated reviewboard https://reviews.apache.org/r/34734/diff/
 against branch origin/trunk

 NullPointerException in TestPurgatoryPerformance
 

 Key: KAFKA-2226
 URL: https://issues.apache.org/jira/browse/KAFKA-2226
 Project: Kafka
  Issue Type: Bug
Reporter: Onur Karaman
Assignee: Yasuhiro Matsuda
 Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, 
 KAFKA-2226_2015-05-29_10:49:34.patch, KAFKA-2226_2015-05-29_15:04:35.patch, 
 KAFKA-2226_2015-05-29_15:10:24.patch


 A NullPointerException sometimes shows up in TimerTaskList.remove while 
 running TestPurgatoryPerformance. I’m on the HEAD of trunk.
 {code}
  ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 
  10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 
  --timeout 20
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to  
 (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1)
 java.lang.NullPointerException
   at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80)
   at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128)
   at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27)
   at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50)
   at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71)
   at 
 kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance

2015-05-29 Thread Yasuhiro Matsuda (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuhiro Matsuda updated KAFKA-2226:

Attachment: KAFKA-2226_2015-05-29_15:10:24.patch

 NullPointerException in TestPurgatoryPerformance
 

 Key: KAFKA-2226
 URL: https://issues.apache.org/jira/browse/KAFKA-2226
 Project: Kafka
  Issue Type: Bug
Reporter: Onur Karaman
Assignee: Yasuhiro Matsuda
 Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, 
 KAFKA-2226_2015-05-29_10:49:34.patch, KAFKA-2226_2015-05-29_15:04:35.patch, 
 KAFKA-2226_2015-05-29_15:10:24.patch


 A NullPointerException sometimes shows up in TimerTaskList.remove while 
 running TestPurgatoryPerformance. I’m on the HEAD of trunk.
 {code}
  ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 
  10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 
  --timeout 20
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to  
 (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1)
 java.lang.NullPointerException
   at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80)
   at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128)
   at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27)
   at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50)
   at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71)
   at 
 kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function

2015-05-29 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565837#comment-14565837
 ] 

Abhishek Nigam commented on KAFKA-1778:
---

I believe what you are suggesting is that we can have a group of brokers 
flagged as potential brokers and all controller elections will be limited to 
that subset of brokers. Do I need to provide any failsafe in case all the 
flagged brokers are not able to participate in the required election and we are 
controller-less?

-Abhishek

 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2213) Log cleaner should write compacted messages using configured compression type

2015-05-29 Thread Joel Koshy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Koshy updated KAFKA-2213:
--
Reviewer: Joel Koshy

 Log cleaner should write compacted messages using configured compression type
 -

 Key: KAFKA-2213
 URL: https://issues.apache.org/jira/browse/KAFKA-2213
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Manikumar Reddy
 Attachments: KAFKA-2213.patch, KAFKA-2213_2015-05-30_00:23:01.patch


 In KAFKA-1374 the log cleaner was improved to handle compressed messages. 
 There were a couple of follow-ups from that:
 * We write compacted messages using the original compression type in the 
 compressed message-set. We should instead append all retained messages with 
 the configured broker compression type of the topic.
 * While compressing messages we should ideally do some batching before 
 compression.
 * Investigate the use of the client compressor. (See the discussion in the 
 RBs for KAFKA-1374)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34734: Patch for KAFKA-2226

2015-05-29 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/#review85764
---


Thanks for the new patch. A few more comments below.


core/src/main/scala/kafka/utils/timer/Timer.scala
https://reviews.apache.org/r/34734/#comment137523

canceled = cancelled



core/src/main/scala/kafka/utils/timer/Timer.scala
https://reviews.apache.org/r/34734/#comment137522

Our current style is to not wrap single-line statement in brackets. Ditto 
for a few places below.



core/src/main/scala/kafka/utils/timer/TimerTaskList.scala
https://reviews.apache.org/r/34734/#comment137548

It seems that if the task entry is still in another list during the add 
call, a deadlock can happen.

Suppose that a task entry is initially in taskList1. The expiration thread 
tries to remove the task entry from taskList1 and to insert it into taskList2. 
The call gets all the way to before line 68. The expiration thread is holding a 
lock on the task entry and taskList2. Now, another thread thread1 tries to 
remove the task entry from taskList1. It grabs the lock on taskList1 and then 
tries to acquire the lock on the task entry, but can't since the expiration 
thread is holding it. The expiration thread resumes in line 68 and tries to 
grab the lock on taskList1, but can't since thread1 is holding it. Now, we are 
in a deadlock.

It seems that this won't happen in our usage since we always remove an 
existing task entry from a list before reinserting it to another list. Because 
of this, add() will never need to hold the lock on two task lists. Not sure if 
it's better to just enforce this assumption or make the code more general than 
we currently need. If we do the former, not sure if we still need to double 
sync on the list and the task entry.



core/src/main/scala/kafka/utils/timer/TimerTaskList.scala
https://reviews.apache.org/r/34734/#comment137519

canceled - cancelled



core/src/main/scala/kafka/utils/timer/TimingWheel.scala
https://reviews.apache.org/r/34734/#comment137520

canceled - cancelled


- Jun Rao


On May 29, 2015, 5:49 p.m., Yasuhiro Matsuda wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34734/
 ---
 
 (Updated May 29, 2015, 5:49 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2226
 https://issues.apache.org/jira/browse/KAFKA-2226
 
 
 Repository: kafka
 
 
 Description
 ---
 
 fix a race condition in TimerTaskEntry.remove
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/utils/timer/Timer.scala 
 b8cde820a770a4e894804f1c268b24b529940650 
   core/src/main/scala/kafka/utils/timer/TimerTask.scala 
 3407138115d579339ffb6b00e32e38c984ac5d6e 
   core/src/main/scala/kafka/utils/timer/TimerTaskList.scala 
 e7a96570ddc2367583d6d5590628db7e7f6ba39b 
   core/src/main/scala/kafka/utils/timer/TimingWheel.scala 
 e92aba3844dbf3372182e14536a5d98cf3366d73 
 
 Diff: https://reviews.apache.org/r/34734/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Yasuhiro Matsuda
 




Re: [DISCUSSION] Partition Selection and Coordination By Brokers for Producers

2015-05-29 Thread Bhavesh Mistry
Hi Kafka Dev Team,

I would appreciate your feedback on moving producer partition selection
from producer to Broker.   Also, please do let me know what is correct
process of collecting feedback from Kafka Dev team and/or community.

Thanks,

Bhavesh

On Tue, May 26, 2015 at 11:54 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com
 wrote:

 Hi Kafka Dev Team,

 I am sorry I am new to process of discussion and/or KIP.  So, I had
 commented  other email voting chain.  Please do let me know correct process
 for collecting and staring discussion with Kafka Dev Group.

 Here is original message:

 I have had experience with both producer and consumer side.  I have
 different  use case on this partition selection strategy.



 Problem :


 We have heterogeneous environment of producers (by that I mean we have
 node js, python, New Java  Old Scala Based producers to same topic).   I
 have seen that not all producers employ round-robing strategies for
 non-keyed message like new producer does.  Hence, it creates non uniform
 data ingestion into partition and delay in overall message processing.

 How to address uniform distribution/message injection rate to all
 partitions ?



 Propose Solution:


 Let broker cluster decide the next partition for topic to send data rather
 than producer itself with more intelligence.

 1)   When sending data to brokers (ProduceResponse) Kafka Protocol over
 the wire send hint to client which partition to send based on following
 logic (Or can be customizable)

 a. Based on overall data injection rate for topic and current
 producer injection rate

 b. Ability rank partition based on consumer rate (Advance Use Case as
 there may be many consumers so weighted average etc... )



 Untimely, brokers will coordinate among thousand of producers and divert
 data injection  rate (out-of-box feature) and consumption rate (pluggable
 interface implementation on brokers’ side).  The goal  here is to attain
 uniformity and/or lower delivery rate to consumer.  This is similar to
 consumer coordination moving to brokers. The producer side partition
 selection would also move to brokers.  This will benefit both java and
 non-java clients.



 Please let me know your feedback on this subject matter.  I am sure lots
 of you run  Kafka in Enterprise Environment where you may have different
 type of producers for same topic (e.g logging client in JavaScript, PHP,
 Java and Python etc sending to log topic).  I would really appreciate your
 feedback on this.





 Thanks,


 Bhavesh



Re: Review Request 34734: Patch for KAFKA-2226

2015-05-29 Thread Yasuhiro Matsuda


 On May 29, 2015, 7:08 p.m., Jun Rao wrote:
  core/src/main/scala/kafka/utils/timer/Timer.scala, line 54
  https://reviews.apache.org/r/34734/diff/3/?file=974375#file974375line54
 
  canceled = cancelled

I will fix it. By the way, canceled is a legitimate spelling in American 
English.


- Yasuhiro


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/#review85764
---


On May 29, 2015, 5:49 p.m., Yasuhiro Matsuda wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34734/
 ---
 
 (Updated May 29, 2015, 5:49 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2226
 https://issues.apache.org/jira/browse/KAFKA-2226
 
 
 Repository: kafka
 
 
 Description
 ---
 
 fix a race condition in TimerTaskEntry.remove
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/utils/timer/Timer.scala 
 b8cde820a770a4e894804f1c268b24b529940650 
   core/src/main/scala/kafka/utils/timer/TimerTask.scala 
 3407138115d579339ffb6b00e32e38c984ac5d6e 
   core/src/main/scala/kafka/utils/timer/TimerTaskList.scala 
 e7a96570ddc2367583d6d5590628db7e7f6ba39b 
   core/src/main/scala/kafka/utils/timer/TimingWheel.scala 
 e92aba3844dbf3372182e14536a5d98cf3366d73 
 
 Diff: https://reviews.apache.org/r/34734/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Yasuhiro Matsuda