[jira] [Created] (KAFKA-1674) auto.create.topics.enable docs are misleading

2014-10-06 Thread Stevo Slavic (JIRA)
Stevo Slavic created KAFKA-1674:
---

 Summary: auto.create.topics.enable docs are misleading
 Key: KAFKA-1674
 URL: https://issues.apache.org/jira/browse/KAFKA-1674
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Stevo Slavic
Priority: Minor


{{auto.create.topics.enable}} is currently 
[documented|http://kafka.apache.org/08/configuration.html] with
{quote}
Enable auto creation of topic on the server. If this is set to true then 
attempts to produce, consume, or fetch metadata for a non-existent topic will 
automatically create it with the default replication factor and number of 
partitions.
{quote}

In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
message on non-existing topic.

After 
[discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
 with [~junrao] conclusion was that it's documentation issue which needs to be 
fixed.

Please check once more if this is just non-working functionality. If it is docs 
only issue, and implicit topic creation functionality should work only for 
producer, consider moving configuration property (docs only, but maybe code 
also?) from broker configuration options to producer configuration options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1675) bootstrapping tidy-up

2014-10-06 Thread Szczepan Faber (JIRA)
Szczepan Faber created KAFKA-1675:
-

 Summary: bootstrapping tidy-up
 Key: KAFKA-1675
 URL: https://issues.apache.org/jira/browse/KAFKA-1675
 Project: Kafka
  Issue Type: Bug
Reporter: Szczepan Faber


I'd like to suggest following changes:

1. remove the 'gradlew' and 'gradlew.bat' scripts from the source tree. Those 
scripts don't work, e.g. they fail with exception when invoked. I just got a 
user report where those scripts were invoked by the user and it led to an 
exception that was not easy to grasp. Bootstrapping step will generate those 
files anyway.

2. move the 'gradleVersion' extra property from the 'build.gradle' into 
'gradle.properties'. Otherwise it is hard to automate the bootstrapping process 
- in order to find out the gradle version, I need to evaluate the build script, 
and for that I need gradle with correct version (kind of a vicious circle). 
Project properties declared in the gradle.properties file can be accessed 
exactly the same as the 'ext' properties, for example: 'project.gradleVersion'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1671) uploaded archives are missing for Scala version 2.11

2014-10-06 Thread Ivan Lyutov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Lyutov reassigned KAFKA-1671:
--

Assignee: Ivan Lyutov

> uploaded archives are missing for Scala version 2.11
> 
>
> Key: KAFKA-1671
> URL: https://issues.apache.org/jira/browse/KAFKA-1671
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Ivan Lyutov
>Priority: Blocker
>  Labels: newbie
> Fix For: 0.8.2
>
>
> https://repository.apache.org/content/groups/staging/org/apache/kafka/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1674) auto.create.topics.enable docs are misleading

2014-10-06 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stevo Slavic updated KAFKA-1674:

Description: 
{{auto.create.topics.enable}} is currently 
[documented|http://kafka.apache.org/08/configuration.html] with
{quote}
Enable auto creation of topic on the server. If this is set to true then 
attempts to produce, consume, or fetch metadata for a non-existent topic will 
automatically create it with the default replication factor and number of 
partitions.
{quote}

In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
message on non-existing topic.

After 
[discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
 with [~junrao] conclusion was that it's documentation issue which needs to be 
fixed.

Please check once more if this is just non-working functionality. If it is docs 
only issue, and implicit topic creation functionality should work only for 
producer, consider moving {{auto.create.topics.enable}} and other topic auto 
creation related configuration properties (docs only, but maybe code also?) 
from broker configuration options to producer configuration options.

  was:
{{auto.create.topics.enable}} is currently 
[documented|http://kafka.apache.org/08/configuration.html] with
{quote}
Enable auto creation of topic on the server. If this is set to true then 
attempts to produce, consume, or fetch metadata for a non-existent topic will 
automatically create it with the default replication factor and number of 
partitions.
{quote}

In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
message on non-existing topic.

After 
[discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
 with [~junrao] conclusion was that it's documentation issue which needs to be 
fixed.

Please check once more if this is just non-working functionality. If it is docs 
only issue, and implicit topic creation functionality should work only for 
producer, consider moving configuration property (docs only, but maybe code 
also?) from broker configuration options to producer configuration options.


> auto.create.topics.enable docs are misleading
> -
>
> Key: KAFKA-1674
> URL: https://issues.apache.org/jira/browse/KAFKA-1674
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Stevo Slavic
>Priority: Minor
>
> {{auto.create.topics.enable}} is currently 
> [documented|http://kafka.apache.org/08/configuration.html] with
> {quote}
> Enable auto creation of topic on the server. If this is set to true then 
> attempts to produce, consume, or fetch metadata for a non-existent topic will 
> automatically create it with the default replication factor and number of 
> partitions.
> {quote}
> In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
> message on non-existing topic.
> After 
> [discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
>  with [~junrao] conclusion was that it's documentation issue which needs to 
> be fixed.
> Please check once more if this is just non-working functionality. If it is 
> docs only issue, and implicit topic creation functionality should work only 
> for producer, consider moving {{auto.create.topics.enable}} and other topic 
> auto creation related configuration properties (docs only, but maybe code 
> also?) from broker configuration options to producer configuration options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1644) Inherit FetchResponse from RequestOrResponse

2014-10-06 Thread Anton Karamanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Karamanov updated KAFKA-1644:
---
Attachment: 0002-KAFKA-1644-Inherit-FetchResponse-from-RequestOrRespo.patch

Maybe it will be OK to just leave {{writeTo}} empty and add a comment with 
explanation?

Consider following 
[patch|^0002-KAFKA-1644-Inherit-FetchResponse-from-RequestOrRespo.patch].

> Inherit FetchResponse from RequestOrResponse
> 
>
> Key: KAFKA-1644
> URL: https://issues.apache.org/jira/browse/KAFKA-1644
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anton Karamanov
>Assignee: Anton Karamanov
> Attachments: 
> 0001-KAFKA-1644-Inherit-FetchResponse-from-RequestOrRespo.patch, 
> 0002-KAFKA-1644-Inherit-FetchResponse-from-RequestOrRespo.patch
>
>
> Unlike all other Kafka API responses {{FetchResponse}} is not a subclass of 
> RequestOrResponse, which requires handling it as a special case while 
> processing responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1675) bootstrapping tidy-up

2014-10-06 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1675:
-
Fix Version/s: 0.8.2
 Assignee: Ivan Lyutov

> bootstrapping tidy-up
> -
>
> Key: KAFKA-1675
> URL: https://issues.apache.org/jira/browse/KAFKA-1675
> Project: Kafka
>  Issue Type: Bug
>Reporter: Szczepan Faber
>Assignee: Ivan Lyutov
> Fix For: 0.8.2
>
>
> I'd like to suggest following changes:
> 1. remove the 'gradlew' and 'gradlew.bat' scripts from the source tree. Those 
> scripts don't work, e.g. they fail with exception when invoked. I just got a 
> user report where those scripts were invoked by the user and it led to an 
> exception that was not easy to grasp. Bootstrapping step will generate those 
> files anyway.
> 2. move the 'gradleVersion' extra property from the 'build.gradle' into 
> 'gradle.properties'. Otherwise it is hard to automate the bootstrapping 
> process - in order to find out the gradle version, I need to evaluate the 
> build script, and for that I need gradle with correct version (kind of a 
> vicious circle). Project properties declared in the gradle.properties file 
> can be accessed exactly the same as the 'ext' properties, for example: 
> 'project.gradleVersion'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 26360: Patch for KAFKA-1671

2014-10-06 Thread Ivan Lyutov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26360/
---

Review request for kafka.


Bugs: KAFKA-1671
https://issues.apache.org/jira/browse/KAFKA-1671


Repository: kafka


Description
---

KAFKA-1671 - uploaded archives are missing for Scala version 2.11


Diffs
-

  build.gradle 2e488a1ab0437e6aaf3221a938690cd2d98ecda8 

Diff: https://reviews.apache.org/r/26360/diff/


Testing
---


Thanks,

Ivan Lyutov



[jira] [Updated] (KAFKA-1671) uploaded archives are missing for Scala version 2.11

2014-10-06 Thread Ivan Lyutov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Lyutov updated KAFKA-1671:
---
Attachment: KAFKA-1671.patch

> uploaded archives are missing for Scala version 2.11
> 
>
> Key: KAFKA-1671
> URL: https://issues.apache.org/jira/browse/KAFKA-1671
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Ivan Lyutov
>Priority: Blocker
>  Labels: newbie
> Fix For: 0.8.2
>
> Attachments: KAFKA-1671.patch
>
>
> https://repository.apache.org/content/groups/staging/org/apache/kafka/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1671) uploaded archives are missing for Scala version 2.11

2014-10-06 Thread Ivan Lyutov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Lyutov updated KAFKA-1671:
---
Status: Patch Available  (was: Open)

> uploaded archives are missing for Scala version 2.11
> 
>
> Key: KAFKA-1671
> URL: https://issues.apache.org/jira/browse/KAFKA-1671
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Ivan Lyutov
>Priority: Blocker
>  Labels: newbie
> Fix For: 0.8.2
>
> Attachments: KAFKA-1671.patch
>
>
> https://repository.apache.org/content/groups/staging/org/apache/kafka/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1671) uploaded archives are missing for Scala version 2.11

2014-10-06 Thread Ivan Lyutov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160292#comment-14160292
 ] 

Ivan Lyutov commented on KAFKA-1671:


Created reviewboard https://reviews.apache.org/r/26360/diff/
 against branch apache/trunk

> uploaded archives are missing for Scala version 2.11
> 
>
> Key: KAFKA-1671
> URL: https://issues.apache.org/jira/browse/KAFKA-1671
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Ivan Lyutov
>Priority: Blocker
>  Labels: newbie
> Fix For: 0.8.2
>
> Attachments: KAFKA-1671.patch
>
>
> https://repository.apache.org/content/groups/staging/org/apache/kafka/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1674) auto.create.topics.enable docs are misleading

2014-10-06 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stevo Slavic updated KAFKA-1674:

Description: 
{{auto.create.topics.enable}} is currently 
[documented|http://kafka.apache.org/08/configuration.html] with
{quote}
Enable auto creation of topic on the server. If this is set to true then 
attempts to produce, consume, or fetch metadata for a non-existent topic will 
automatically create it with the default replication factor and number of 
partitions.
{quote}

In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
message on non-existing topic.

After 
[discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
 with [~junrao] conclusion was that it's documentation issue which needs to be 
fixed.

Please check once more if this is just non-working functionality. If it is docs 
only issue, and implicit topic creation functionality should work only for 
producer, consider moving {{auto.create.topics.enable}} and maybe also 
{{num.partitions}}, {{default.replication.factor}} and any other topic auto 
creation related configuration properties (docs only, but maybe code also?) 
from broker configuration options to producer configuration options.

  was:
{{auto.create.topics.enable}} is currently 
[documented|http://kafka.apache.org/08/configuration.html] with
{quote}
Enable auto creation of topic on the server. If this is set to true then 
attempts to produce, consume, or fetch metadata for a non-existent topic will 
automatically create it with the default replication factor and number of 
partitions.
{quote}

In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
message on non-existing topic.

After 
[discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
 with [~junrao] conclusion was that it's documentation issue which needs to be 
fixed.

Please check once more if this is just non-working functionality. If it is docs 
only issue, and implicit topic creation functionality should work only for 
producer, consider moving {{auto.create.topics.enable}} and other topic auto 
creation related configuration properties (docs only, but maybe code also?) 
from broker configuration options to producer configuration options.


> auto.create.topics.enable docs are misleading
> -
>
> Key: KAFKA-1674
> URL: https://issues.apache.org/jira/browse/KAFKA-1674
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Stevo Slavic
>Priority: Minor
>
> {{auto.create.topics.enable}} is currently 
> [documented|http://kafka.apache.org/08/configuration.html] with
> {quote}
> Enable auto creation of topic on the server. If this is set to true then 
> attempts to produce, consume, or fetch metadata for a non-existent topic will 
> automatically create it with the default replication factor and number of 
> partitions.
> {quote}
> In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
> message on non-existing topic.
> After 
> [discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
>  with [~junrao] conclusion was that it's documentation issue which needs to 
> be fixed.
> Please check once more if this is just non-working functionality. If it is 
> docs only issue, and implicit topic creation functionality should work only 
> for producer, consider moving {{auto.create.topics.enable}} and maybe also 
> {{num.partitions}}, {{default.replication.factor}} and any other topic auto 
> creation related configuration properties (docs only, but maybe code also?) 
> from broker configuration options to producer configuration options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1675) bootstrapping tidy-up

2014-10-06 Thread Ivan Lyutov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Lyutov updated KAFKA-1675:
---
Status: Patch Available  (was: Open)

> bootstrapping tidy-up
> -
>
> Key: KAFKA-1675
> URL: https://issues.apache.org/jira/browse/KAFKA-1675
> Project: Kafka
>  Issue Type: Bug
>Reporter: Szczepan Faber
>Assignee: Ivan Lyutov
> Fix For: 0.8.2
>
> Attachments: KAFKA-1675.patch
>
>
> I'd like to suggest following changes:
> 1. remove the 'gradlew' and 'gradlew.bat' scripts from the source tree. Those 
> scripts don't work, e.g. they fail with exception when invoked. I just got a 
> user report where those scripts were invoked by the user and it led to an 
> exception that was not easy to grasp. Bootstrapping step will generate those 
> files anyway.
> 2. move the 'gradleVersion' extra property from the 'build.gradle' into 
> 'gradle.properties'. Otherwise it is hard to automate the bootstrapping 
> process - in order to find out the gradle version, I need to evaluate the 
> build script, and for that I need gradle with correct version (kind of a 
> vicious circle). Project properties declared in the gradle.properties file 
> can be accessed exactly the same as the 'ext' properties, for example: 
> 'project.gradleVersion'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1675) bootstrapping tidy-up

2014-10-06 Thread Ivan Lyutov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160314#comment-14160314
 ] 

Ivan Lyutov commented on KAFKA-1675:


Created reviewboard https://reviews.apache.org/r/26362/diff/
 against branch apache/trunk

> bootstrapping tidy-up
> -
>
> Key: KAFKA-1675
> URL: https://issues.apache.org/jira/browse/KAFKA-1675
> Project: Kafka
>  Issue Type: Bug
>Reporter: Szczepan Faber
>Assignee: Ivan Lyutov
> Fix For: 0.8.2
>
> Attachments: KAFKA-1675.patch
>
>
> I'd like to suggest following changes:
> 1. remove the 'gradlew' and 'gradlew.bat' scripts from the source tree. Those 
> scripts don't work, e.g. they fail with exception when invoked. I just got a 
> user report where those scripts were invoked by the user and it led to an 
> exception that was not easy to grasp. Bootstrapping step will generate those 
> files anyway.
> 2. move the 'gradleVersion' extra property from the 'build.gradle' into 
> 'gradle.properties'. Otherwise it is hard to automate the bootstrapping 
> process - in order to find out the gradle version, I need to evaluate the 
> build script, and for that I need gradle with correct version (kind of a 
> vicious circle). Project properties declared in the gradle.properties file 
> can be accessed exactly the same as the 'ext' properties, for example: 
> 'project.gradleVersion'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 26362: Patch for KAFKA-1675

2014-10-06 Thread Ivan Lyutov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26362/
---

Review request for kafka.


Bugs: KAFKA-1675
https://issues.apache.org/jira/browse/KAFKA-1675


Repository: kafka


Description
---

KAFKA-1675 - moved gradleVersion to gradle.properties, removed gradlew scripts


Diffs
-

  build.gradle 2e488a1ab0437e6aaf3221a938690cd2d98ecda8 
  gradle.properties 5d3155fd4461438d8b2ec4faa9534cc2383d4951 
  gradlew 91a7e269e19dfc62e27137a0b57ef3e430cee4fd 
  gradlew.bat aec99730b4e8fcd90b57a0e8e01544fea7c31a89 

Diff: https://reviews.apache.org/r/26362/diff/


Testing
---


Thanks,

Ivan Lyutov



[jira] [Updated] (KAFKA-1675) bootstrapping tidy-up

2014-10-06 Thread Ivan Lyutov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Lyutov updated KAFKA-1675:
---
Attachment: KAFKA-1675.patch

> bootstrapping tidy-up
> -
>
> Key: KAFKA-1675
> URL: https://issues.apache.org/jira/browse/KAFKA-1675
> Project: Kafka
>  Issue Type: Bug
>Reporter: Szczepan Faber
>Assignee: Ivan Lyutov
> Fix For: 0.8.2
>
> Attachments: KAFKA-1675.patch
>
>
> I'd like to suggest following changes:
> 1. remove the 'gradlew' and 'gradlew.bat' scripts from the source tree. Those 
> scripts don't work, e.g. they fail with exception when invoked. I just got a 
> user report where those scripts were invoked by the user and it led to an 
> exception that was not easy to grasp. Bootstrapping step will generate those 
> files anyway.
> 2. move the 'gradleVersion' extra property from the 'build.gradle' into 
> 'gradle.properties'. Otherwise it is hard to automate the bootstrapping 
> process - in order to find out the gradle version, I need to evaluate the 
> build script, and for that I need gradle with correct version (kind of a 
> vicious circle). Project properties declared in the gradle.properties file 
> can be accessed exactly the same as the 'ext' properties, for example: 
> 'project.gradleVersion'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1674) auto.create.topics.enable docs are misleading

2014-10-06 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stevo Slavic updated KAFKA-1674:

Description: 
{{auto.create.topics.enable}} is currently 
[documented|http://kafka.apache.org/08/configuration.html] with
{quote}
Enable auto creation of topic on the server. If this is set to true then 
attempts to produce, consume, or fetch metadata for a non-existent topic will 
automatically create it with the default replication factor and number of 
partitions.
{quote}

In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
message on non-existing topic.

After 
[discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
 with [~junrao] conclusion was that it's documentation issue which needs to be 
fixed.

Before fixing docs, please check once more if this is just non-working 
functionality.

  was:
{{auto.create.topics.enable}} is currently 
[documented|http://kafka.apache.org/08/configuration.html] with
{quote}
Enable auto creation of topic on the server. If this is set to true then 
attempts to produce, consume, or fetch metadata for a non-existent topic will 
automatically create it with the default replication factor and number of 
partitions.
{quote}

In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
message on non-existing topic.

After 
[discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
 with [~junrao] conclusion was that it's documentation issue which needs to be 
fixed.

Please check once more if this is just non-working functionality. If it is docs 
only issue, and implicit topic creation functionality should work only for 
producer, consider moving {{auto.create.topics.enable}} and maybe also 
{{num.partitions}}, {{default.replication.factor}} and any other topic auto 
creation related configuration properties (docs only, but maybe code also?) 
from broker configuration options to producer configuration options.


> auto.create.topics.enable docs are misleading
> -
>
> Key: KAFKA-1674
> URL: https://issues.apache.org/jira/browse/KAFKA-1674
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Stevo Slavic
>Priority: Minor
>
> {{auto.create.topics.enable}} is currently 
> [documented|http://kafka.apache.org/08/configuration.html] with
> {quote}
> Enable auto creation of topic on the server. If this is set to true then 
> attempts to produce, consume, or fetch metadata for a non-existent topic will 
> automatically create it with the default replication factor and number of 
> partitions.
> {quote}
> In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
> message on non-existing topic.
> After 
> [discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
>  with [~junrao] conclusion was that it's documentation issue which needs to 
> be fixed.
> Before fixing docs, please check once more if this is just non-working 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1673) potential java.lang.IllegalStateException in BufferPool.allocate()

2014-10-06 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1673:
---
Resolution: Fixed
  Reviewer: Jay Kreps
Status: Resolved  (was: Patch Available)

Thanks for the review. Committed to both trunk and 0.8.2.

> potential  java.lang.IllegalStateException in BufferPool.allocate()
> ---
>
> Key: KAFKA-1673
> URL: https://issues.apache.org/jira/browse/KAFKA-1673
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Jun Rao
> Fix For: 0.8.2
>
> Attachments: KAFKA-1673.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: changed List type in RequestPurgatory.Watchers

2014-10-06 Thread mdykman
GitHub user mdykman opened a pull request:

https://github.com/apache/kafka/pull/34

changed List type in RequestPurgatory.Watchers

 class Watchers is declared, having member

private val requests = new util.ArrayList[T]
 
in purgeSatisfied(), an iterator is created which conditionally removes 
elements from the list as it goes:

 RequestPurgatory.scala, lines 193-201:

   val iter = requests.iterator()
var purged = 0
while(iter.hasNext) {
  val curr = iter.next
  if(curr.satisfied.get()) {
iter.remove()
purged += 1
  }
}

Using the .remove operation on an ArrayList iterator is very expensive as 
the ArrayList promises a contiguous backing array and all higher elements must 
be shifted on every operation.

A LinkedList is optimized for for the removal of arbitrary elements. 
Therefore recommending:

-private val requests = new util.ArrayList[T]
+private val requests = new util.LinkedList[T]


This case was identified due to an observed failure In a high-load 
production environment, which found one or more cores pinned (ie. @100 usage) 
insider this loop.  Once the pin was established, it tended to remain until the 
system was restarted. 

As the loop is within a synchronized block, checkAndMaybeAdd (0.8.2) or add 
(0.8.1) which are also synchronized on the same object, may become blocked by 
this inefficiency.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mdykman/kafka 0.8.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/34.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #34


commit ade6abea822b87ac72309ecd6c43cfe4adc81e22
Author: Michael Dykman 
Date:   2014-10-03T20:11:14Z

changed List type in RequestPurgatory.Worker




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: Kafka-trunk #289

2014-10-06 Thread Apache Jenkins Server
See 

Changes:

[junrao] kafka-1673; potential java.lang.IllegalStateException in 
BufferPool.allocate(); patched by Jun Rao; reviewed by Jay Kreps

--
[...truncated 1694 lines...]

kafka.admin.AdminTest > testPartitionReassignmentNonOverlappingReplicas FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest > testReassigningNonExistingPartition FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest > testResumePartitionReassignmentThatWasCompleted FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest > testPreferredReplicaJsonData FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest > testBasicPreferredReplicaElection FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest > testShutdownBroker FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at kafka.admin.AdminTest.setUp(AdminTest.scala:33)

kafka.admin.AdminTest > testTopicConfigChange FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOSer

[jira] [Updated] (KAFKA-1668) TopicCommand doesn't warn if --topic argument doesn't match any topics

2014-10-06 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1668:
-
Reviewer: Jay Kreps

[~jkreps] Feel free to reassign for review.

> TopicCommand doesn't warn if --topic argument doesn't match any topics
> --
>
> Key: KAFKA-1668
> URL: https://issues.apache.org/jira/browse/KAFKA-1668
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Ryan Berdeen
>Assignee: Manikumar Reddy
>Priority: Minor
>  Labels: newbie
> Attachments: KAFKA-1668.patch
>
>
> Running {{kafka-topics.sh --alter}} with an invalid {{--topic}} argument 
> produces no output and exits with 0, indicating success.
> {code}
> $ bin/kafka-topics.sh --topic does-not-exist --alter --config invalid=xxx 
> --zookeeper zkhost:2181
> $ echo $?
> 0
> {code}
> An invalid topic name or a regular expression that matches 0 topics should at 
> least print a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Java New Producer Kafka Trunk ] Need a State Check API Method

2014-10-06 Thread Jay Kreps
Hey Bhavesh,

This is a sanity check. If you send a message after calling close on the
producer you should get this error. It sounds like you have multiple
threads sending, and you close the producer in the middle of this, then you
get this error. This is expected.

Perhaps I am misunderstanding?

I think tracking the state (i.e. whether you have called close or not) can
be done just as easily in your code, right?

-Jay

On Sun, Oct 5, 2014 at 7:32 PM, Bhavesh Mistry 
wrote:

> Hi Kafka Dev Team,
>
> *java.lang.*
> *IllegalStateException: Cannot send after the producer is closed.*
>
> The above seems to bug.  If the ProducerRecord is in flight within send
> method is execute and another thread seems to shutdown in the middle of
> flight  will get error.
>
> Thanks,
>
> Bhavesh
>
> On Sun, Oct 5, 2014 at 7:15 PM, Bhavesh Mistry  >
> wrote:
>
> > Hi Kafka Dev Team,
> >
> > The use case is that we need to know producer state in background Threads
> > and so we can submit the message.
> >
> > This seems to a bug in trunk code.  I have notice that KafkaProducer
> > itself does not have close state and inflight message will encounter
> > following issues.  Should I file bug for this issue ?
> >
> > java.lang.IllegalStateException: Cannot send after the producer is
> closed.
> > at
> >
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:136)
> > at
> >
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:237)
> > .
> > at java.util.TimerThread.mainLoop(Timer.java:555)
> > at java.util.TimerThread.run(Timer.java:505)
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Sun, Oct 5, 2014 at 3:30 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com
> > > wrote:
> >
> >> HI Kafka Dev,
> >>
> >> I would like to request state check state so  I can manage the Life
> Cycle
> >> of Producer better.   I you guys agree I will file Jira request.  I just
> >> give state of producer can be I would like mange or start (create new
> >> instance of producer) or restart or close based on state.   I just gave
> >> example, you may add or remove states.
> >>
> >> /***
> >>
> >> * API TO CHECK STATE OF PRODUCER
> >>
> >> *  @Return
> >>
> >>
> >>
> >>  STATE.INIT_IN_PROGRESS
> >>
> >>  STATE.INIT_DONE
> >>
> >>  STATE.RUNNING
> >>
> >>  STATE.CLOSE_REQUESTED
> >>
> >>  STATE.CLOSE_IN_PROGRESS
> >>
> >>  STATE.CLOSED
> >>
> >> */
> >>
> >> public State getCurrentState();
> >>
> >> Thanks,
> >>
> >> Bhavesh
> >>
> >
> >
>


[jira] [Updated] (KAFKA-1430) Purgatory redesign

2014-10-06 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-1430:
-
Priority: Blocker  (was: Major)

> Purgatory redesign
> --
>
> Key: KAFKA-1430
> URL: https://issues.apache.org/jira/browse/KAFKA-1430
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430.patch, 
> KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430_2014-06-05_17:07:37.patch, 
> KAFKA-1430_2014-06-05_17:13:13.patch, KAFKA-1430_2014-06-05_17:40:53.patch, 
> KAFKA-1430_2014-06-10_11:22:06.patch, KAFKA-1430_2014-06-10_11:26:02.patch, 
> KAFKA-1430_2014-07-11_10:59:13.patch, KAFKA-1430_2014-07-21_12:53:39.patch, 
> KAFKA-1430_2014-07-25_09:52:43.patch, KAFKA-1430_2014-07-28_11:30:23.patch, 
> KAFKA-1430_2014-07-31_15:04:33.patch, KAFKA-1430_2014-08-05_14:54:21.patch
>
>
> We have seen 2 main issues with the Purgatory.
> 1. There is no atomic checkAndWatch functionality. So, a client typically 
> first checks whether a request is satisfied or not and then register the 
> watcher. However, by the time the watcher is registered, the registered item 
> could already be satisfied. This item won't be satisfied until the next 
> update happens or the delayed time expires, which means the watched item 
> could be delayed. 
> 2. FetchRequestPurgatory doesn't quite work. This is because the current 
> design tries to incrementally maintain the accumulated bytes ready for fetch. 
> However, this is difficult since the right time to check whether a fetch (for 
> regular consumer) request is satisfied is when the high watermark moves. At 
> that point, it's hard to figure out how many bytes we should incrementally 
> add to each pending fetch request.
> The problem has been reported in KAFKA-1150 and KAFKA-703.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-1430) Purgatory redesign

2014-10-06 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps reopened KAFKA-1430:
--

This change reverted KAFKA-1468. That was the issue where we had huge pauses 
due to using remove() on ArrayList in a tight loop. The fix should be pretty 
simple--just change back to using LinkedList (not sure if too much else 
changed). But since it looks like that rebase/merge was done without examining 
the changes it would be good for someone to go through carefully and see if 
anything else got dropped.

> Purgatory redesign
> --
>
> Key: KAFKA-1430
> URL: https://issues.apache.org/jira/browse/KAFKA-1430
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Guozhang Wang
> Fix For: 0.8.2
>
> Attachments: KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430.patch, 
> KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430_2014-06-05_17:07:37.patch, 
> KAFKA-1430_2014-06-05_17:13:13.patch, KAFKA-1430_2014-06-05_17:40:53.patch, 
> KAFKA-1430_2014-06-10_11:22:06.patch, KAFKA-1430_2014-06-10_11:26:02.patch, 
> KAFKA-1430_2014-07-11_10:59:13.patch, KAFKA-1430_2014-07-21_12:53:39.patch, 
> KAFKA-1430_2014-07-25_09:52:43.patch, KAFKA-1430_2014-07-28_11:30:23.patch, 
> KAFKA-1430_2014-07-31_15:04:33.patch, KAFKA-1430_2014-08-05_14:54:21.patch
>
>
> We have seen 2 main issues with the Purgatory.
> 1. There is no atomic checkAndWatch functionality. So, a client typically 
> first checks whether a request is satisfied or not and then register the 
> watcher. However, by the time the watcher is registered, the registered item 
> could already be satisfied. This item won't be satisfied until the next 
> update happens or the delayed time expires, which means the watched item 
> could be delayed. 
> 2. FetchRequestPurgatory doesn't quite work. This is because the current 
> design tries to incrementally maintain the accumulated bytes ready for fetch. 
> However, this is difficult since the right time to check whether a fetch (for 
> regular consumer) request is satisfied is when the high watermark moves. At 
> that point, it's hard to figure out how many bytes we should incrementally 
> add to each pending fetch request.
> The problem has been reported in KAFKA-1150 and KAFKA-703.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26346: Patch for KAFKA-1670

2014-10-06 Thread Sriharsha Chintalapani


> On Oct. 5, 2014, 11:35 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/log/Log.scala, line 499
> > 
> >
> > Would it be simpler to just do the following?
> > 
> > if (segment.size + messageSize < (long) config.segmentSize)

that would cause it roll everytime a message batch is appended . Lets say the 
segment.size is 10 and messagesSize is 10 and config.segmentSize is 2147483647
the above check will pass and roll the current segment without reaching the  
config.segmentSize.


- Sriharsha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55479
---


On Oct. 5, 2014, 3:17 a.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> ---
> 
> (Updated Oct. 5, 2014, 3:17 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
> https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/log/Log.scala 
> 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 
> 577d102fc2eb6bb1a72326141ecd431db6d66f04 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



Re: [GitHub] kafka pull request: changed List type in RequestPurgatory.Watchers

2014-10-06 Thread Jay Kreps
Hey Michael,

Nice catch. We had actually fixed this issue once already, but it looks
like the change was lost. Re-opened KAFKA-1430 to track:

https://issues.apache.org/jira/browse/KAFKA-1430

-Jay

On Mon, Oct 6, 2014 at 7:58 AM, mdykman  wrote:

> GitHub user mdykman opened a pull request:
>
> https://github.com/apache/kafka/pull/34
>
> changed List type in RequestPurgatory.Watchers
>
>  class Watchers is declared, having member
>
> private val requests = new util.ArrayList[T]
>
> in purgeSatisfied(), an iterator is created which conditionally
> removes elements from the list as it goes:
>
>  RequestPurgatory.scala, lines 193-201:
>
>val iter = requests.iterator()
> var purged = 0
> while(iter.hasNext) {
>   val curr = iter.next
>   if(curr.satisfied.get()) {
> iter.remove()
> purged += 1
>   }
> }
>
> Using the .remove operation on an ArrayList iterator is very expensive
> as the ArrayList promises a contiguous backing array and all higher
> elements must be shifted on every operation.
>
> A LinkedList is optimized for for the removal of arbitrary elements.
> Therefore recommending:
>
> -private val requests = new util.ArrayList[T]
> +private val requests = new util.LinkedList[T]
>
>
> This case was identified due to an observed failure In a high-load
> production environment, which found one or more cores pinned (ie. @100
> usage) insider this loop.  Once the pin was established, it tended to
> remain until the system was restarted.
>
> As the loop is within a synchronized block, checkAndMaybeAdd (0.8.2)
> or add (0.8.1) which are also synchronized on the same object, may become
> blocked by this inefficiency.
>
> You can merge this pull request into a Git repository by running:
>
> $ git pull https://github.com/mdykman/kafka 0.8.2
>
> Alternatively you can review and apply these changes as the patch at:
>
> https://github.com/apache/kafka/pull/34.patch
>
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
>
> This closes #34
>
> 
> commit ade6abea822b87ac72309ecd6c43cfe4adc81e22
> Author: Michael Dykman 
> Date:   2014-10-03T20:11:14Z
>
> changed List type in RequestPurgatory.Worker
>
> 
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


[jira] [Commented] (KAFKA-1670) Corrupt log files for segment.bytes values close to Int.MaxInt

2014-10-06 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160484#comment-14160484
 ] 

Sriharsha Chintalapani commented on KAFKA-1670:
---

Updated reviewboard https://reviews.apache.org/r/26346/diff/
 against branch origin/trunk

> Corrupt log files for segment.bytes values close to Int.MaxInt
> --
>
> Key: KAFKA-1670
> URL: https://issues.apache.org/jira/browse/KAFKA-1670
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Ryan Berdeen
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1670.patch, KAFKA-1670_2014-10-04_20:17:46.patch, 
> KAFKA-1670_2014-10-06_09:48:25.patch
>
>
> The maximum value for the topic-level config {{segment.bytes}} is 
> {{Int.MaxInt}} (2147483647). *Using this value causes brokers to corrupt 
> their log files, leaving them unreadable.*
> We set {{segment.bytes}} to {{2122317824}} which is well below the maximum. 
> One by one, the ISR of all partitions shrunk to 1. Brokers would crash when 
> restarted, attempting to read from a negative offset in a log file. After 
> discovering that many segment files had grown to 4GB or more, we were forced 
> to shut down our *entire production Kafka cluster* for several hours while we 
> split all segment files into 1GB chunks.
> Looking into the {{kafka.log}} code, the {{segment.bytes}} parameter is used 
> inconsistently. It is treated as a *soft* maximum for the size of the segment 
> file 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/LogConfig.scala#L26)
>  with logs rolled only after 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/Log.scala#L246)
>  they exceed this value. However, much of the code that deals with log files 
> uses *ints* to store the size of the file and the position in the file. 
> Overflow of these ints leads the broker to append to the segments 
> indefinitely, and to fail to read these segments for consuming or recovery.
> This is trivial to reproduce:
> {code}
> $ bin/kafka-topics.sh --topic segment-bytes-test --create 
> --replication-factor 2 --partitions 1 --zookeeper zkhost:2181
> $ bin/kafka-topics.sh --topic segment-bytes-test --alter --config 
> segment.bytes=2147483647 --zookeeper zkhost:2181
> $ yes "Int.MaxValue is a ridiculous bound on file size in 2014" | 
> bin/kafka-console-producer.sh --broker-list localhost:6667 zkhost:2181 
> --topic segment-bytes-test
> {code}
> After running for a few minutes, the log file is corrupt:
> {code}
> $ ls -lh data/segment-bytes-test-0/
> total 9.7G
> -rw-r--r-- 1 root root  10M Oct  3 19:39 .index
> -rw-r--r-- 1 root root 9.7G Oct  3 19:39 .log
> {code}
> We recovered the data from the log files using a simple Python script: 
> https://gist.github.com/also/9f823d9eb9dc0a410796



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1670) Corrupt log files for segment.bytes values close to Int.MaxInt

2014-10-06 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-1670:
--
Attachment: KAFKA-1670_2014-10-06_09:48:25.patch

> Corrupt log files for segment.bytes values close to Int.MaxInt
> --
>
> Key: KAFKA-1670
> URL: https://issues.apache.org/jira/browse/KAFKA-1670
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Ryan Berdeen
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1670.patch, KAFKA-1670_2014-10-04_20:17:46.patch, 
> KAFKA-1670_2014-10-06_09:48:25.patch
>
>
> The maximum value for the topic-level config {{segment.bytes}} is 
> {{Int.MaxInt}} (2147483647). *Using this value causes brokers to corrupt 
> their log files, leaving them unreadable.*
> We set {{segment.bytes}} to {{2122317824}} which is well below the maximum. 
> One by one, the ISR of all partitions shrunk to 1. Brokers would crash when 
> restarted, attempting to read from a negative offset in a log file. After 
> discovering that many segment files had grown to 4GB or more, we were forced 
> to shut down our *entire production Kafka cluster* for several hours while we 
> split all segment files into 1GB chunks.
> Looking into the {{kafka.log}} code, the {{segment.bytes}} parameter is used 
> inconsistently. It is treated as a *soft* maximum for the size of the segment 
> file 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/LogConfig.scala#L26)
>  with logs rolled only after 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/Log.scala#L246)
>  they exceed this value. However, much of the code that deals with log files 
> uses *ints* to store the size of the file and the position in the file. 
> Overflow of these ints leads the broker to append to the segments 
> indefinitely, and to fail to read these segments for consuming or recovery.
> This is trivial to reproduce:
> {code}
> $ bin/kafka-topics.sh --topic segment-bytes-test --create 
> --replication-factor 2 --partitions 1 --zookeeper zkhost:2181
> $ bin/kafka-topics.sh --topic segment-bytes-test --alter --config 
> segment.bytes=2147483647 --zookeeper zkhost:2181
> $ yes "Int.MaxValue is a ridiculous bound on file size in 2014" | 
> bin/kafka-console-producer.sh --broker-list localhost:6667 zkhost:2181 
> --topic segment-bytes-test
> {code}
> After running for a few minutes, the log file is corrupt:
> {code}
> $ ls -lh data/segment-bytes-test-0/
> total 9.7G
> -rw-r--r-- 1 root root  10M Oct  3 19:39 .index
> -rw-r--r-- 1 root root 9.7G Oct  3 19:39 .log
> {code}
> We recovered the data from the log files using a simple Python script: 
> https://gist.github.com/also/9f823d9eb9dc0a410796



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26346: Patch for KAFKA-1670

2014-10-06 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/
---

(Updated Oct. 6, 2014, 4:48 p.m.)


Review request for kafka.


Bugs: KAFKA-1670
https://issues.apache.org/jira/browse/KAFKA-1670


Repository: kafka


Description
---

KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


Diffs (updated)
-

  core/src/main/scala/kafka/log/Log.scala 
0ddf97bd30311b6039e19abade41d2fbbad2f59b 

Diff: https://reviews.apache.org/r/26346/diff/


Testing
---


Thanks,

Sriharsha Chintalapani



[jira] [Commented] (KAFKA-1430) Purgatory redesign

2014-10-06 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160500#comment-14160500
 ] 

Guozhang Wang commented on KAFKA-1430:
--

I could go through the changes again, but ks KAFKA-1468 ever committed?

> Purgatory redesign
> --
>
> Key: KAFKA-1430
> URL: https://issues.apache.org/jira/browse/KAFKA-1430
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430.patch, 
> KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430_2014-06-05_17:07:37.patch, 
> KAFKA-1430_2014-06-05_17:13:13.patch, KAFKA-1430_2014-06-05_17:40:53.patch, 
> KAFKA-1430_2014-06-10_11:22:06.patch, KAFKA-1430_2014-06-10_11:26:02.patch, 
> KAFKA-1430_2014-07-11_10:59:13.patch, KAFKA-1430_2014-07-21_12:53:39.patch, 
> KAFKA-1430_2014-07-25_09:52:43.patch, KAFKA-1430_2014-07-28_11:30:23.patch, 
> KAFKA-1430_2014-07-31_15:04:33.patch, KAFKA-1430_2014-08-05_14:54:21.patch
>
>
> We have seen 2 main issues with the Purgatory.
> 1. There is no atomic checkAndWatch functionality. So, a client typically 
> first checks whether a request is satisfied or not and then register the 
> watcher. However, by the time the watcher is registered, the registered item 
> could already be satisfied. This item won't be satisfied until the next 
> update happens or the delayed time expires, which means the watched item 
> could be delayed. 
> 2. FetchRequestPurgatory doesn't quite work. This is because the current 
> design tries to incrementally maintain the accumulated bytes ready for fetch. 
> However, this is difficult since the right time to check whether a fetch (for 
> regular consumer) request is satisfied is when the high watermark moves. At 
> that point, it's hard to figure out how many bytes we should incrementally 
> add to each pending fetch request.
> The problem has been reported in KAFKA-1150 and KAFKA-703.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1430) Purgatory redesign

2014-10-06 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160503#comment-14160503
 ] 

Guozhang Wang commented on KAFKA-1430:
--

Cool. I saw that in the commit log, though the ticket is not closed yet.

> Purgatory redesign
> --
>
> Key: KAFKA-1430
> URL: https://issues.apache.org/jira/browse/KAFKA-1430
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430.patch, 
> KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430_2014-06-05_17:07:37.patch, 
> KAFKA-1430_2014-06-05_17:13:13.patch, KAFKA-1430_2014-06-05_17:40:53.patch, 
> KAFKA-1430_2014-06-10_11:22:06.patch, KAFKA-1430_2014-06-10_11:26:02.patch, 
> KAFKA-1430_2014-07-11_10:59:13.patch, KAFKA-1430_2014-07-21_12:53:39.patch, 
> KAFKA-1430_2014-07-25_09:52:43.patch, KAFKA-1430_2014-07-28_11:30:23.patch, 
> KAFKA-1430_2014-07-31_15:04:33.patch, KAFKA-1430_2014-08-05_14:54:21.patch
>
>
> We have seen 2 main issues with the Purgatory.
> 1. There is no atomic checkAndWatch functionality. So, a client typically 
> first checks whether a request is satisfied or not and then register the 
> watcher. However, by the time the watcher is registered, the registered item 
> could already be satisfied. This item won't be satisfied until the next 
> update happens or the delayed time expires, which means the watched item 
> could be delayed. 
> 2. FetchRequestPurgatory doesn't quite work. This is because the current 
> design tries to incrementally maintain the accumulated bytes ready for fetch. 
> However, this is difficult since the right time to check whether a fetch (for 
> regular consumer) request is satisfied is when the high watermark moves. At 
> that point, it's hard to figure out how many bytes we should incrementally 
> add to each pending fetch request.
> The problem has been reported in KAFKA-1150 and KAFKA-703.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

2014-10-06 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1647:

Status: Patch Available  (was: Open)

> Replication offset checkpoints (high water marks) can be lost on hard kills 
> and restarts
> 
>
> Key: KAFKA-1647
> URL: https://issues.apache.org/jira/browse/KAFKA-1647
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Joel Koshy
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: newbie++
> Attachments: KAFKA-1647.patch
>
>
> We ran into this scenario recently in a production environment. This can 
> happen when enough brokers in a cluster are taken down. i.e., a rolling 
> bounce done properly should not cause this issue. It can occur if all 
> replicas for any partition are taken down.
> Here is a sample scenario:
> * Cluster of three brokers: b0, b1, b2
> * Two partitions (of some topic) with replication factor two: p0, p1
> * Initial state:
> p0: leader = b0, ISR = {b0, b1}
> p1: leader = b1, ISR = {b0, b1}
> * Do a parallel hard-kill of all brokers
> * Bring up b2, so it is the new controller
> * b2 initializes its controller context and populates its leader/ISR cache 
> (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
> known leaders are b0 (for p0) and b1 (for p2)
> * Bring up b1
> * The controller's onBrokerStartup procedure initiates a replica state change 
> for all replicas on b1 to become online. As part of this replica state change 
> it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
> (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
> leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
> included in the leaders field because b0 is down.
> * On receiving the LeaderAndIsrRequest, b1's replica manager will 
> successfully make itself (b1) the leader for p1 (and create the local replica 
> object corresponding to p1). It will however abort the become follower 
> transition for p0 because the designated leader b0 is offline. So it will not 
> create the local replica object for p0.
> * It will then start the high water mark checkpoint thread. Since only p1 has 
> a local replica object, only p1's high water mark will be checkpointed to 
> disk. p0's previously written checkpoint  if any will be lost.
> So in summary it seems we should always create the local replica object even 
> if the online transition does not happen.
> Possible symptoms of the above bug could be one or more of the following (we 
> saw 2 and 3):
> # Data loss; yes on a hard-kill data loss is expected, but this can actually 
> cause loss of nearly all data if the broker becomes follower, truncates, and 
> soon after happens to become leader.
> # High IO on brokers that lose their high water mark then subsequently (on a 
> successful become follower transition) truncate their log to zero and start 
> catching up from the beginning.
> # If the offsets topic is affected, then offsets can get reset. This is 
> because during an offset load we don't read past the high water mark. So if a 
> water mark is missing then we don't load anything (even if the offsets are 
> there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

2014-10-06 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1647:

Attachment: KAFKA-1647.patch

> Replication offset checkpoints (high water marks) can be lost on hard kills 
> and restarts
> 
>
> Key: KAFKA-1647
> URL: https://issues.apache.org/jira/browse/KAFKA-1647
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Joel Koshy
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: newbie++
> Attachments: KAFKA-1647.patch
>
>
> We ran into this scenario recently in a production environment. This can 
> happen when enough brokers in a cluster are taken down. i.e., a rolling 
> bounce done properly should not cause this issue. It can occur if all 
> replicas for any partition are taken down.
> Here is a sample scenario:
> * Cluster of three brokers: b0, b1, b2
> * Two partitions (of some topic) with replication factor two: p0, p1
> * Initial state:
> p0: leader = b0, ISR = {b0, b1}
> p1: leader = b1, ISR = {b0, b1}
> * Do a parallel hard-kill of all brokers
> * Bring up b2, so it is the new controller
> * b2 initializes its controller context and populates its leader/ISR cache 
> (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
> known leaders are b0 (for p0) and b1 (for p2)
> * Bring up b1
> * The controller's onBrokerStartup procedure initiates a replica state change 
> for all replicas on b1 to become online. As part of this replica state change 
> it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
> (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
> leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
> included in the leaders field because b0 is down.
> * On receiving the LeaderAndIsrRequest, b1's replica manager will 
> successfully make itself (b1) the leader for p1 (and create the local replica 
> object corresponding to p1). It will however abort the become follower 
> transition for p0 because the designated leader b0 is offline. So it will not 
> create the local replica object for p0.
> * It will then start the high water mark checkpoint thread. Since only p1 has 
> a local replica object, only p1's high water mark will be checkpointed to 
> disk. p0's previously written checkpoint  if any will be lost.
> So in summary it seems we should always create the local replica object even 
> if the online transition does not happen.
> Possible symptoms of the above bug could be one or more of the following (we 
> saw 2 and 3):
> # Data loss; yes on a hard-kill data loss is expected, but this can actually 
> cause loss of nearly all data if the broker becomes follower, truncates, and 
> soon after happens to become leader.
> # High IO on brokers that lose their high water mark then subsequently (on a 
> successful become follower transition) truncate their log to zero and start 
> catching up from the beginning.
> # If the offsets topic is affected, then offsets can get reset. This is 
> because during an offset load we don't read past the high water mark. So if a 
> water mark is missing then we don't load anything (even if the offsets are 
> there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 26373: Patch for KAFKA-1647

2014-10-06 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26373/
---

Review request for kafka.


Bugs: KAFKA-1647
https://issues.apache.org/jira/browse/KAFKA-1647


Repository: kafka


Description
---

Fix for Kafka-1647.


Diffs
-

  core/src/main/scala/kafka/server/ReplicaManager.scala 
78b7514cc109547c562e635824684fad581af653 

Diff: https://reviews.apache.org/r/26373/diff/


Testing
---


Thanks,

Jiangjie Qin



[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

2014-10-06 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160510#comment-14160510
 ] 

Jiangjie Qin commented on KAFKA-1647:
-

Created reviewboard https://reviews.apache.org/r/26373/diff/
 against branch origin/trunk

> Replication offset checkpoints (high water marks) can be lost on hard kills 
> and restarts
> 
>
> Key: KAFKA-1647
> URL: https://issues.apache.org/jira/browse/KAFKA-1647
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Joel Koshy
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: newbie++
> Attachments: KAFKA-1647.patch
>
>
> We ran into this scenario recently in a production environment. This can 
> happen when enough brokers in a cluster are taken down. i.e., a rolling 
> bounce done properly should not cause this issue. It can occur if all 
> replicas for any partition are taken down.
> Here is a sample scenario:
> * Cluster of three brokers: b0, b1, b2
> * Two partitions (of some topic) with replication factor two: p0, p1
> * Initial state:
> p0: leader = b0, ISR = {b0, b1}
> p1: leader = b1, ISR = {b0, b1}
> * Do a parallel hard-kill of all brokers
> * Bring up b2, so it is the new controller
> * b2 initializes its controller context and populates its leader/ISR cache 
> (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
> known leaders are b0 (for p0) and b1 (for p2)
> * Bring up b1
> * The controller's onBrokerStartup procedure initiates a replica state change 
> for all replicas on b1 to become online. As part of this replica state change 
> it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
> (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
> leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
> included in the leaders field because b0 is down.
> * On receiving the LeaderAndIsrRequest, b1's replica manager will 
> successfully make itself (b1) the leader for p1 (and create the local replica 
> object corresponding to p1). It will however abort the become follower 
> transition for p0 because the designated leader b0 is offline. So it will not 
> create the local replica object for p0.
> * It will then start the high water mark checkpoint thread. Since only p1 has 
> a local replica object, only p1's high water mark will be checkpointed to 
> disk. p0's previously written checkpoint  if any will be lost.
> So in summary it seems we should always create the local replica object even 
> if the online transition does not happen.
> Possible symptoms of the above bug could be one or more of the following (we 
> saw 2 and 3):
> # Data loss; yes on a hard-kill data loss is expected, but this can actually 
> cause loss of nearly all data if the broker becomes follower, truncates, and 
> soon after happens to become leader.
> # High IO on brokers that lose their high water mark then subsequently (on a 
> successful become follower transition) truncate their log to zero and start 
> catching up from the beginning.
> # If the offsets topic is affected, then offsets can get reset. This is 
> because during an offset load we don't read past the high water mark. So if a 
> water mark is missing then we don't load anything (even if the offsets are 
> there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1430) Purgatory redesign

2014-10-06 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160513#comment-14160513
 ] 

Jay Kreps commented on KAFKA-1430:
--

Yes it was committed.

> Purgatory redesign
> --
>
> Key: KAFKA-1430
> URL: https://issues.apache.org/jira/browse/KAFKA-1430
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430.patch, 
> KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430_2014-06-05_17:07:37.patch, 
> KAFKA-1430_2014-06-05_17:13:13.patch, KAFKA-1430_2014-06-05_17:40:53.patch, 
> KAFKA-1430_2014-06-10_11:22:06.patch, KAFKA-1430_2014-06-10_11:26:02.patch, 
> KAFKA-1430_2014-07-11_10:59:13.patch, KAFKA-1430_2014-07-21_12:53:39.patch, 
> KAFKA-1430_2014-07-25_09:52:43.patch, KAFKA-1430_2014-07-28_11:30:23.patch, 
> KAFKA-1430_2014-07-31_15:04:33.patch, KAFKA-1430_2014-08-05_14:54:21.patch
>
>
> We have seen 2 main issues with the Purgatory.
> 1. There is no atomic checkAndWatch functionality. So, a client typically 
> first checks whether a request is satisfied or not and then register the 
> watcher. However, by the time the watcher is registered, the registered item 
> could already be satisfied. This item won't be satisfied until the next 
> update happens or the delayed time expires, which means the watched item 
> could be delayed. 
> 2. FetchRequestPurgatory doesn't quite work. This is because the current 
> design tries to incrementally maintain the accumulated bytes ready for fetch. 
> However, this is difficult since the right time to check whether a fetch (for 
> regular consumer) request is satisfied is when the high watermark moves. At 
> that point, it's hard to figure out how many bytes we should incrementally 
> add to each pending fetch request.
> The problem has been reported in KAFKA-1150 and KAFKA-703.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26346: Patch for KAFKA-1670

2014-10-06 Thread Neha Narkhede


> On Oct. 5, 2014, 11:35 p.m., Jun Rao wrote:
> > core/src/test/scala/unit/kafka/log/LogTest.scala, lines 113-129
> > 
> >
> > Appending 2GB of data in a unit test is probably too long. We can 
> > probably just manually validate the fix and skip the unit test.

The concern I have with manual validation only is that over time with enough of 
these bugs, we don't any coverage for regression testing. Any chance this can 
at least be a system test?


- Neha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55479
---


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> ---
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
> https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/log/Log.scala 
> 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



Re: Review Request 25995: Patch for KAFKA-1650

2014-10-06 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25995/
---

(Updated Oct. 6, 2014, 5:17 p.m.)


Review request for kafka.


Bugs: KAFKA-1650
https://issues.apache.org/jira/browse/KAFKA-1650


Repository: kafka


Description (updated)
---

Addressed Guozhang's comments.


Addressed Guozhang's comments


Diffs (updated)
-

  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
fbc680fde21b02f11285a4f4b442987356abd17b 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
78b7514cc109547c562e635824684fad581af653 
  core/src/main/scala/kafka/tools/MirrorMaker.scala 
b8698ee1469c8fbc92ccc176d916eb3e28b87867 
  core/src/main/scala/kafka/utils/ByteBoundedBlockingQueue.scala PRE-CREATION 

Diff: https://reviews.apache.org/r/25995/diff/


Testing
---


Thanks,

Jiangjie Qin



[jira] [Commented] (KAFKA-1650) Mirror Maker could lose data on unclean shutdown.

2014-10-06 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160526#comment-14160526
 ] 

Jiangjie Qin commented on KAFKA-1650:
-

Updated reviewboard https://reviews.apache.org/r/25995/diff/
 against branch origin/trunk

> Mirror Maker could lose data on unclean shutdown.
> -
>
> Key: KAFKA-1650
> URL: https://issues.apache.org/jira/browse/KAFKA-1650
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Attachments: KAFKA-1650.patch, KAFKA-1650_2014-10-06_10:17:46.patch
>
>
> Currently if mirror maker got shutdown uncleanly, the data in the data 
> channel and buffer could potentially be lost. With the new producer's 
> callback, this issue could be solved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1650) Mirror Maker could lose data on unclean shutdown.

2014-10-06 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1650:

Attachment: KAFKA-1650_2014-10-06_10:17:46.patch

> Mirror Maker could lose data on unclean shutdown.
> -
>
> Key: KAFKA-1650
> URL: https://issues.apache.org/jira/browse/KAFKA-1650
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Attachments: KAFKA-1650.patch, KAFKA-1650_2014-10-06_10:17:46.patch
>
>
> Currently if mirror maker got shutdown uncleanly, the data in the data 
> channel and buffer could potentially be lost. With the new producer's 
> callback, this issue could be solved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25995: Patch for KAFKA-1650

2014-10-06 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25995/
---

(Updated Oct. 6, 2014, 5:20 p.m.)


Review request for kafka.


Bugs: KAFKA-1650
https://issues.apache.org/jira/browse/KAFKA-1650


Repository: kafka


Description (updated)
---

Addressed Guozhang's comments.

Talked with Joel and decided to remove multi connector support as people can 
always creat multiple mirror maker instances if they want to consumer from 
multiple clusters.


Diffs
-

  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
fbc680fde21b02f11285a4f4b442987356abd17b 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
78b7514cc109547c562e635824684fad581af653 
  core/src/main/scala/kafka/tools/MirrorMaker.scala 
b8698ee1469c8fbc92ccc176d916eb3e28b87867 
  core/src/main/scala/kafka/utils/ByteBoundedBlockingQueue.scala PRE-CREATION 

Diff: https://reviews.apache.org/r/25995/diff/


Testing
---


Thanks,

Jiangjie Qin



Re: Review Request 26346: Patch for KAFKA-1670

2014-10-06 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55526
---



core/src/main/scala/kafka/log/Log.scala


We probably should enforce that a segment is never larger than 
config.segmentSize. So, if messageSize is larger than config.segmentSize, we 
should just throw an exception. Once we do that, it seems that we can just use 
the following check to cover both conditions.

if (segment.size + messageSize > (long) config.segmentSize

We likely need to adjust the comment above accordingly.


- Jun Rao


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> ---
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
> https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/log/Log.scala 
> 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2014-10-06 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160538#comment-14160538
 ] 

Guozhang Wang commented on KAFKA-1367:
--

Regarding the fix to this issue, we can either 1) remove the ISR field from the 
metadata response and hence enforce people to use the admin tool (with ZK 
dependency) for such usages, which would also require a protocol change between 
client / server; or 2) let the controller to also watch for ISR change and 
propagate that information to brokers, this will not introduce protocol change 
to clients but will likely add a lot of burden on controllers since ISR change 
is more frequent than leader migrations.

[~jjkoshy][~junrao] any other thoughts?

> Broker topic metadata not kept in sync with ZooKeeper
> -
>
> Key: KAFKA-1367
> URL: https://issues.apache.org/jira/browse/KAFKA-1367
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Ryan Berdeen
>  Labels: newbie++
> Attachments: KAFKA-1367.txt
>
>
> When a broker is restarted, the topic metadata responses from the brokers 
> will be incorrect (different from ZooKeeper) until a preferred replica leader 
> election.
> In the metadata, it looks like leaders are correctly removed from the ISR 
> when a broker disappears, but followers are not. Then, when a broker 
> reappears, the ISR is never updated.
> I used a variation of the Vagrant setup created by Joe Stein to reproduce 
> this with latest from the 0.8.1 branch: 
> https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1676) Ability to cancel replica reassignment in progress

2014-10-06 Thread Ryan Berdeen (JIRA)
Ryan Berdeen created KAFKA-1676:
---

 Summary: Ability to cancel replica reassignment in progress
 Key: KAFKA-1676
 URL: https://issues.apache.org/jira/browse/KAFKA-1676
 Project: Kafka
  Issue Type: New Feature
  Components: controller
Affects Versions: 0.8.1.1
Reporter: Ryan Berdeen
Assignee: Neha Narkhede


I've had several situations where I have started a replica reassignment that 
I've needed to cancel before it completed.

This has happened 
* when moving to a new broker that turns out to be running on an impaired server
* if the extra replication hurts cluster performance
* dealing with replication bugs in kafka, like KAFKA-1670
* when a single replica reassignment is taking a long time, and I want to start 
more replica assignments without waiting for the current one to finish.

For the first three cases, as a last resort I have deleted the 
{{/admin/reassign_partitions}} key from ZooKeeper and restarted the controller. 
I would like to be able to do this by signaling the controller to stop, and to 
leave the list of assignments as they exist at that moment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26346: Patch for KAFKA-1670

2014-10-06 Thread Sriharsha Chintalapani


> On Oct. 6, 2014, 5:24 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/log/Log.scala, line 502
> > 
> >
> > We probably should enforce that a segment is never larger than 
> > config.segmentSize. So, if messageSize is larger than config.segmentSize, 
> > we should just throw an exception. Once we do that, it seems that we can 
> > just use the following check to cover both conditions.
> > 
> > if (segment.size + messageSize > (long) config.segmentSize
> > 
> > We likely need to adjust the comment above accordingly.

Sorry I am not able follow . can you please elaborate on this " So, if 
messageSize is larger than config.segmentSize,".
Here the issue is not the messageSize is larger than the config.segmentSize.
Currently we only roll when segment.size is greater than the config.segmentSize 
and the edge case here is
if the config.segmentSize is Int.MaxValue and the current segment.size is 
Int.MaxValue - 1 we still wouldn't roll the segment
and append the current batch to the same segment and next time we check 
segment.size is overflown will be negative and still fail to pass the check 
segment.size > config.segmentSize and we keep appending to the same LogSegment.

if (segment.size + messageSize > (long) config.segmentSize
This condition wouldn't work since segment.size is Int and if its value is 
anywhere closer to Int.MaxValue adding the current messages size will cause it 
overflown.

we can change the above condition to 
if (segment.size.toLong + messageSize > config.segmentSize) 
and changing the comment to 
LogSegment will be rolled if  segment.size + messagesBatch.size is greater than 
config.segmentSize. 
Please let me know if these changes looks good. Thanks.


- Sriharsha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55526
---


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> ---
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
> https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/log/Log.scala 
> 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



[jira] [Created] (KAFKA-1677) Governor on concurrent replica reassignments

2014-10-06 Thread Ryan Berdeen (JIRA)
Ryan Berdeen created KAFKA-1677:
---

 Summary: Governor on concurrent replica reassignments
 Key: KAFKA-1677
 URL: https://issues.apache.org/jira/browse/KAFKA-1677
 Project: Kafka
  Issue Type: New Feature
  Components: controller
Reporter: Ryan Berdeen
Assignee: Neha Narkhede


We have seen a cluster be killed via too many concurrent partition transfers. 
An ideal solution is a configuration setting to limit the number of concurrent 
transfers per host (dynamically tunable). (eg: transfer_limit defined in 
http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/#Ring).

To work around this, we generate our assignments, then use a tool to feed the 
reassignments in small batches.

The size of the batch is based on either
* *the number partitions*, e.g., reassign all replicas for the first 2 
partitions that have any moves
* *the number of individual replica moves*, e.g. when reassigning \[1,2,3,4] to 
\[5,6,7,8], first reassign to \[5,6,3,4] then reassign to \[5,6,7,8]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1678) add new options for reassign partition to in service a replacement replica for another one

2014-10-06 Thread Joe Stein (JIRA)
Joe Stein created KAFKA-1678:


 Summary: add new options for reassign partition to in service a 
replacement replica for another one
 Key: KAFKA-1678
 URL: https://issues.apache.org/jira/browse/KAFKA-1678
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
 Fix For: 0.8.3


to move all everything from an old broker to a new one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-06 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1678:
-
Summary: add new options for reassign partition to better manager dead 
brokers  (was: add new options for reassign partition to in service a 
replacement replica for another one)

> add new options for reassign partition to better manager dead brokers
> -
>
> Key: KAFKA-1678
> URL: https://issues.apache.org/jira/browse/KAFKA-1678
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
> Fix For: 0.8.3
>
>
> to move all everything from an old broker to a new one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-06 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1678:
-
Description: 
this is in two forms

--replace-replica 

which is from broker.id to broker.id

and 

--remove-replica

which is just a single broker.id

  was:to move all everything from an old broker to a new one


> add new options for reassign partition to better manager dead brokers
> -
>
> Key: KAFKA-1678
> URL: https://issues.apache.org/jira/browse/KAFKA-1678
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
> Fix For: 0.8.3
>
>
> this is in two forms
> --replace-replica 
> which is from broker.id to broker.id
> and 
> --remove-replica
> which is just a single broker.id



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-06 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1678:
-
Labels: operations  (was: )

> add new options for reassign partition to better manager dead brokers
> -
>
> Key: KAFKA-1678
> URL: https://issues.apache.org/jira/browse/KAFKA-1678
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>  Labels: operations
> Fix For: 0.8.3
>
>
> this is in two forms
> --replace-replica 
> which is from broker.id to broker.id
> and 
> --remove-replica
> which is just a single broker.id



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-06 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reassigned KAFKA-1678:
---

Assignee: Gwen Shapira

> add new options for reassign partition to better manager dead brokers
> -
>
> Key: KAFKA-1678
> URL: https://issues.apache.org/jira/browse/KAFKA-1678
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Gwen Shapira
>  Labels: operations
> Fix For: 0.8.3
>
>
> this is in two forms
> --replace-replica 
> which is from broker.id to broker.id
> and 
> --remove-replica
> which is just a single broker.id



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2014-10-06 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160717#comment-14160717
 ] 

Guozhang Wang commented on KAFKA-1367:
--

Just talked to Joel offline. I think since ISR (and also Leader) info in broker 
is just a cached snapshot and cannot be really used in a scenario like this 
(i.e. depending on the ISR list to determine if the ack received with -1 
setting is reliable or not, since the ISR can shrink while the ack is sent 
back), we could remove the ISR cache from the brokers and also remove it from 
the metadata response, unless there is a clear use case of this information.

> Broker topic metadata not kept in sync with ZooKeeper
> -
>
> Key: KAFKA-1367
> URL: https://issues.apache.org/jira/browse/KAFKA-1367
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Ryan Berdeen
>  Labels: newbie++
> Attachments: KAFKA-1367.txt
>
>
> When a broker is restarted, the topic metadata responses from the brokers 
> will be incorrect (different from ZooKeeper) until a preferred replica leader 
> election.
> In the metadata, it looks like leaders are correctly removed from the ISR 
> when a broker disappears, but followers are not. Then, when a broker 
> reappears, the ISR is never updated.
> I used a variation of the Vagrant setup created by Joe Stein to reproduce 
> this with latest from the 0.8.1 branch: 
> https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2014-10-06 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160772#comment-14160772
 ] 

Neha Narkhede commented on KAFKA-1367:
--

The ISR cache on the broker was added only because we had to expose that 
information through the topic metadata response. I don't think we gave a lot of 
thought, back then, on why the ISR information is useful in the topic metadata 
response (especially since it's stale and effectively inaccurate). I am not 
entirely sure if having the controller be aware of all ISR changes is terrible 
even though it's true that the # of watches it has to add is proportional to 
the # of partitions in a cluster. But it's not worth doing that if we don't 
find a use for the ISR information in the topic metadata response. So I'd vote 
for removing ISR from topic metadata and also from the broker's metadata cache.

> Broker topic metadata not kept in sync with ZooKeeper
> -
>
> Key: KAFKA-1367
> URL: https://issues.apache.org/jira/browse/KAFKA-1367
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Ryan Berdeen
>  Labels: newbie++
> Attachments: KAFKA-1367.txt
>
>
> When a broker is restarted, the topic metadata responses from the brokers 
> will be incorrect (different from ZooKeeper) until a preferred replica leader 
> election.
> In the metadata, it looks like leaders are correctly removed from the ISR 
> when a broker disappears, but followers are not. Then, when a broker 
> reappears, the ISR is never updated.
> I used a variation of the Vagrant setup created by Joe Stein to reproduce 
> this with latest from the 0.8.1 branch: 
> https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-06 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160783#comment-14160783
 ] 

Neha Narkhede commented on KAFKA-1678:
--

What is the purpose of replace replica? If the only purpose is to move the 
broker from one faulty machine to another, you could just restart a broker with 
the same broker.id. Decommission a broker is more useful as Kafka would have to 
evenly distribute the replicas on the broker being decommissioned onto other 
brokers in the cluster. It's also worth thinking how this change would affect 
the rest of the user experience when using the reassign partitions command. 
Today, the way it's designed is that there are multiple options to generate a 
replica assignment and the one way to execute it. It's not very intuitive, 
though that change is bigger and hence probably deserves it's own JIRA. 
Presumably, you are suggesting add another option for generating an assignment?

> add new options for reassign partition to better manager dead brokers
> -
>
> Key: KAFKA-1678
> URL: https://issues.apache.org/jira/browse/KAFKA-1678
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Gwen Shapira
>  Labels: operations
> Fix For: 0.8.3
>
>
> this is in two forms
> --replace-replica 
> which is from broker.id to broker.id
> and 
> --remove-replica
> which is just a single broker.id



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-06 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1678:
-
Reviewer: Neha Narkhede

> add new options for reassign partition to better manager dead brokers
> -
>
> Key: KAFKA-1678
> URL: https://issues.apache.org/jira/browse/KAFKA-1678
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Gwen Shapira
>  Labels: operations
> Fix For: 0.8.3
>
>
> this is in two forms
> --replace-replica 
> which is from broker.id to broker.id
> and 
> --remove-replica
> which is just a single broker.id



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled

2014-10-06 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160789#comment-14160789
 ] 

Jay Kreps commented on KAFKA-1305:
--

FWIW, from my perspective being able to enable auto leader balancing would be a 
huge win. This is arguably the biggest operational "gotcha" today...

> Controller can hang on controlled shutdown with auto leader balance enabled
> ---
>
> Key: KAFKA-1305
> URL: https://issues.apache.org/jira/browse/KAFKA-1305
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joel Koshy
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.8.2, 0.9.0
>
>
> This is relatively easy to reproduce especially when doing a rolling bounce.
> What happened here is as follows:
> 1. The previous controller was bounced and broker 265 became the new 
> controller.
> 2. I went on to do a controlled shutdown of broker 265 (the new controller).
> 3. In the mean time the automatically scheduled preferred replica leader 
> election process started doing its thing and starts sending 
> LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers).  
> (t@113 below).
> 4. While that's happening, the controlled shutdown process on 265 succeeds 
> and proceeds to deregister itself from ZooKeeper and shuts down the socket 
> server.
> 5. (ReplicaStateMachine actually removes deregistered brokers from the 
> controller channel manager's list of brokers to send requests to.  However, 
> that removal cannot take place (t@18 below) because preferred replica leader 
> election task owns the controller lock.)
> 6. So the request thread to broker 265 gets into infinite retries.
> 7. The entire broker shutdown process is blocked on controller shutdown for 
> the same reason (it needs to acquire the controller lock).
> Relevant portions from the thread-dump:
> "Controller-265-to-broker-265-send-thread" - Thread t@113
>java.lang.Thread.State: TIMED_WAITING
>   at java.lang.Thread.sleep(Native Method)
>   at 
> kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:143)
>   at kafka.utils.Utils$.swallow(Utils.scala:167)
>   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
>   at kafka.utils.Logging$class.swallow(Logging.scala:94)
>   at kafka.utils.Utils$.swallow(Utils.scala:46)
>   at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:143)
>   at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
>   - locked java.lang.Object@6dbf14a7
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
>Locked ownable synchronizers:
>   - None
> ...
> "Thread-4" - Thread t@17
>java.lang.Thread.State: WAITING on 
> java.util.concurrent.locks.ReentrantLock$NonfairSync@4836840 owned by: 
> kafka-scheduler-0
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>   at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>   at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>   at kafka.utils.Utils$.inLock(Utils.scala:536)
>   at kafka.controller.KafkaController.shutdown(KafkaController.scala:642)
>   at 
> kafka.server.KafkaServer$$anonfun$shutdown$9.apply$mcV$sp(KafkaServer.scala:242)
>   at kafka.utils.Utils$.swallow(Utils.scala:167)
>   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
>   at kafka.utils.Logging$class.swallow(Logging.scala:94)
>   at kafka.utils.Utils$.swallow(Utils.scala:46)
>   at kafka.server.KafkaServer.shutdown(KafkaServer.scala:242)
>   at 
> kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:46)
>   at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> ...
> "kafka-scheduler-0" - Thread t@117
>java.lang.Thread.State: WAITING on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1dc407fc
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>   at 
>

[jira] [Commented] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled

2014-10-06 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160815#comment-14160815
 ] 

Andrew Olson commented on KAFKA-1305:
-

> from my perspective being able to enable auto leader balancing would be a 
> huge win

We are running with auto leader balance enabled and controlled shutdown 
disabled. Given that they're currently mutually exclusive options, is 
controlled shutdown generally considered more valuable than auto leader 
balancing? If so, why is that?

> Controller can hang on controlled shutdown with auto leader balance enabled
> ---
>
> Key: KAFKA-1305
> URL: https://issues.apache.org/jira/browse/KAFKA-1305
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joel Koshy
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.8.2, 0.9.0
>
>
> This is relatively easy to reproduce especially when doing a rolling bounce.
> What happened here is as follows:
> 1. The previous controller was bounced and broker 265 became the new 
> controller.
> 2. I went on to do a controlled shutdown of broker 265 (the new controller).
> 3. In the mean time the automatically scheduled preferred replica leader 
> election process started doing its thing and starts sending 
> LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers).  
> (t@113 below).
> 4. While that's happening, the controlled shutdown process on 265 succeeds 
> and proceeds to deregister itself from ZooKeeper and shuts down the socket 
> server.
> 5. (ReplicaStateMachine actually removes deregistered brokers from the 
> controller channel manager's list of brokers to send requests to.  However, 
> that removal cannot take place (t@18 below) because preferred replica leader 
> election task owns the controller lock.)
> 6. So the request thread to broker 265 gets into infinite retries.
> 7. The entire broker shutdown process is blocked on controller shutdown for 
> the same reason (it needs to acquire the controller lock).
> Relevant portions from the thread-dump:
> "Controller-265-to-broker-265-send-thread" - Thread t@113
>java.lang.Thread.State: TIMED_WAITING
>   at java.lang.Thread.sleep(Native Method)
>   at 
> kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:143)
>   at kafka.utils.Utils$.swallow(Utils.scala:167)
>   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
>   at kafka.utils.Logging$class.swallow(Logging.scala:94)
>   at kafka.utils.Utils$.swallow(Utils.scala:46)
>   at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:143)
>   at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
>   - locked java.lang.Object@6dbf14a7
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
>Locked ownable synchronizers:
>   - None
> ...
> "Thread-4" - Thread t@17
>java.lang.Thread.State: WAITING on 
> java.util.concurrent.locks.ReentrantLock$NonfairSync@4836840 owned by: 
> kafka-scheduler-0
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>   at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>   at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>   at kafka.utils.Utils$.inLock(Utils.scala:536)
>   at kafka.controller.KafkaController.shutdown(KafkaController.scala:642)
>   at 
> kafka.server.KafkaServer$$anonfun$shutdown$9.apply$mcV$sp(KafkaServer.scala:242)
>   at kafka.utils.Utils$.swallow(Utils.scala:167)
>   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
>   at kafka.utils.Logging$class.swallow(Logging.scala:94)
>   at kafka.utils.Utils$.swallow(Utils.scala:46)
>   at kafka.server.KafkaServer.shutdown(KafkaServer.scala:242)
>   at 
> kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:46)
>   at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> ...
> "kafka-scheduler-0" - Thread t@117
>java.lang.Thread.State: WAITING on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1dc407fc
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.conc

[jira] [Commented] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled

2014-10-06 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160828#comment-14160828
 ] 

Jay Kreps commented on KAFKA-1305:
--

Well I guess the point is that they shouldn't be mutually exclusive. So 
hopefully we can make them both be enabled by default.

> Controller can hang on controlled shutdown with auto leader balance enabled
> ---
>
> Key: KAFKA-1305
> URL: https://issues.apache.org/jira/browse/KAFKA-1305
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joel Koshy
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.8.2, 0.9.0
>
>
> This is relatively easy to reproduce especially when doing a rolling bounce.
> What happened here is as follows:
> 1. The previous controller was bounced and broker 265 became the new 
> controller.
> 2. I went on to do a controlled shutdown of broker 265 (the new controller).
> 3. In the mean time the automatically scheduled preferred replica leader 
> election process started doing its thing and starts sending 
> LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers).  
> (t@113 below).
> 4. While that's happening, the controlled shutdown process on 265 succeeds 
> and proceeds to deregister itself from ZooKeeper and shuts down the socket 
> server.
> 5. (ReplicaStateMachine actually removes deregistered brokers from the 
> controller channel manager's list of brokers to send requests to.  However, 
> that removal cannot take place (t@18 below) because preferred replica leader 
> election task owns the controller lock.)
> 6. So the request thread to broker 265 gets into infinite retries.
> 7. The entire broker shutdown process is blocked on controller shutdown for 
> the same reason (it needs to acquire the controller lock).
> Relevant portions from the thread-dump:
> "Controller-265-to-broker-265-send-thread" - Thread t@113
>java.lang.Thread.State: TIMED_WAITING
>   at java.lang.Thread.sleep(Native Method)
>   at 
> kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:143)
>   at kafka.utils.Utils$.swallow(Utils.scala:167)
>   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
>   at kafka.utils.Logging$class.swallow(Logging.scala:94)
>   at kafka.utils.Utils$.swallow(Utils.scala:46)
>   at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:143)
>   at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
>   - locked java.lang.Object@6dbf14a7
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
>Locked ownable synchronizers:
>   - None
> ...
> "Thread-4" - Thread t@17
>java.lang.Thread.State: WAITING on 
> java.util.concurrent.locks.ReentrantLock$NonfairSync@4836840 owned by: 
> kafka-scheduler-0
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>   at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>   at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>   at kafka.utils.Utils$.inLock(Utils.scala:536)
>   at kafka.controller.KafkaController.shutdown(KafkaController.scala:642)
>   at 
> kafka.server.KafkaServer$$anonfun$shutdown$9.apply$mcV$sp(KafkaServer.scala:242)
>   at kafka.utils.Utils$.swallow(Utils.scala:167)
>   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
>   at kafka.utils.Logging$class.swallow(Logging.scala:94)
>   at kafka.utils.Utils$.swallow(Utils.scala:46)
>   at kafka.server.KafkaServer.shutdown(KafkaServer.scala:242)
>   at 
> kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:46)
>   at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> ...
> "kafka-scheduler-0" - Thread t@117
>java.lang.Thread.State: WAITING on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1dc407fc
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>   at 
> java.util.concurrent.

[jira] [Commented] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-06 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160848#comment-14160848
 ] 

Gwen Shapira commented on KAFKA-1678:
-

1. It just seems like a reasonable thing to support. Managing re-use of brokers 
can be a headache for admins, especially since we are moving in the direction 
of auto-generating broker ids (and a lot of places already do that in their 
automated install scripts. Joe's example of cloud installs was a good use-case 
too.

2. In the future we'll have security and brokers will need to authenticate with 
each other in order to get replicas. Losing a broker can mean losing its keys 
and then adding a new broker that looks just like the old one will be even more 
painful.

> add new options for reassign partition to better manager dead brokers
> -
>
> Key: KAFKA-1678
> URL: https://issues.apache.org/jira/browse/KAFKA-1678
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joe Stein
>Assignee: Gwen Shapira
>  Labels: operations
> Fix For: 0.8.3
>
>
> this is in two forms
> --replace-replica 
> which is from broker.id to broker.id
> and 
> --remove-replica
> which is just a single broker.id



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25886: KAFKA-1555: provide strong consistency with reasonable availability

2014-10-06 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25886/
---

(Updated Oct. 6, 2014, 8:28 p.m.)


Review request for kafka.


Changes
---

Fixed additional comments by Jun Rao


Repository: kafka


Description
---

KAFKA-1555: provide strong consistency with reasonable availability


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java 
f9de4af 
  clients/src/main/java/org/apache/kafka/common/config/ConfigDef.java addc906 
  
clients/src/main/java/org/apache/kafka/common/errors/NotEnoughReplicasAfterAppendException.java
 PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/common/errors/NotEnoughReplicasException.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/common/protocol/Errors.java d434f42 
  core/src/main/scala/kafka/cluster/Partition.scala ff106b4 
  core/src/main/scala/kafka/common/ErrorMapping.scala 3fae791 
  core/src/main/scala/kafka/common/NotEnoughReplicasAfterAppendException.scala 
PRE-CREATION 
  core/src/main/scala/kafka/common/NotEnoughReplicasException.scala 
PRE-CREATION 
  core/src/main/scala/kafka/log/LogConfig.scala 5746ad4 
  core/src/main/scala/kafka/producer/SyncProducerConfig.scala 69b2d0c 
  core/src/main/scala/kafka/server/KafkaApis.scala c584b55 
  core/src/main/scala/kafka/server/KafkaConfig.scala 165c816 
  core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 
39f777b 
  core/src/test/scala/unit/kafka/producer/ProducerTest.scala dd71d81 
  core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala 24deea0 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 2dbdd3c 

Diff: https://reviews.apache.org/r/25886/diff/


Testing
---

With 3 broker cluster, created 3 topics each with 1 partition and 3 replicas, 
with 1,3 and 4 min.insync.replicas.
* min.insync.replicas=1 behaved normally (all writes succeeded as long as a 
broker was up)
* min.insync.replicas=3 returned NotEnoughReplicas when required.acks=-1 and 
one broker was down
* min.insync.replicas=4 returned NotEnoughReplicas when required.acks=-1

See notes about retry behavior in the JIRA.


Thanks,

Gwen Shapira



[jira] [Updated] (KAFKA-1555) provide strong consistency with reasonable availability

2014-10-06 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-1555:

Attachment: KAFKA-1555.6.patch

Latest version, based on RB discussion.

> provide strong consistency with reasonable availability
> ---
>
> Key: KAFKA-1555
> URL: https://issues.apache.org/jira/browse/KAFKA-1555
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Jiang Wu
>Assignee: Gwen Shapira
> Fix For: 0.8.2
>
> Attachments: KAFKA-1555.0.patch, KAFKA-1555.1.patch, 
> KAFKA-1555.2.patch, KAFKA-1555.3.patch, KAFKA-1555.4.patch, 
> KAFKA-1555.5.patch, KAFKA-1555.5.patch, KAFKA-1555.6.patch
>
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2014-10-06 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160927#comment-14160927
 ] 

Magnus Edenhill commented on KAFKA-1367:


May I suggest not to change the protocol but to only send an empty ISR vector 
in the MetadataResponse?

> Broker topic metadata not kept in sync with ZooKeeper
> -
>
> Key: KAFKA-1367
> URL: https://issues.apache.org/jira/browse/KAFKA-1367
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Ryan Berdeen
>  Labels: newbie++
> Attachments: KAFKA-1367.txt
>
>
> When a broker is restarted, the topic metadata responses from the brokers 
> will be incorrect (different from ZooKeeper) until a preferred replica leader 
> election.
> In the metadata, it looks like leaders are correctly removed from the ISR 
> when a broker disappears, but followers are not. Then, when a broker 
> reappears, the ISR is never updated.
> I used a variation of the Vagrant setup created by Joe Stein to reproduce 
> this with latest from the 0.8.1 branch: 
> https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 26390: Fix KAFKA-1641

2014-10-06 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26390/
---

Review request for kafka.


Bugs: KAFKA-1641
https://issues.apache.org/jira/browse/KAFKA-1641


Repository: kafka


Description
---

Reset cleaning start offset upon abnormal log truncation


Diffs
-

  core/src/main/scala/kafka/log/LogCleanerManager.scala 
e8ced6a5922508ea3274905be7c3d6e728f320ac 

Diff: https://reviews.apache.org/r/26390/diff/


Testing
---


Thanks,

Guozhang Wang



[jira] [Updated] (KAFKA-1641) Log cleaner exits if last cleaned offset is lower than earliest offset

2014-10-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1641:
-
Attachment: KAFKA-1641.patch

> Log cleaner exits if last cleaned offset is lower than earliest offset
> --
>
> Key: KAFKA-1641
> URL: https://issues.apache.org/jira/browse/KAFKA-1641
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Joel Koshy
> Attachments: KAFKA-1641.patch
>
>
> Encountered this recently: the log cleaner exited a while ago (I think 
> because the topic had compressed messages). That issue was subsequently 
> addressed by having the producer only send uncompressed. However, on a 
> subsequent restart of the broker we see this:
> In this scenario I think it is reasonable to just emit a warning and have the 
> cleaner round up its first dirty offset to the base offset of the first 
> segment.
> {code}
> [kafka-server] [] [kafka-log-cleaner-thread-0], Error due to 
> java.lang.IllegalArgumentException: requirement failed: Last clean offset is 
> 54770438 but segment base offset is 382844024 for log testtopic-0.
> at scala.Predef$.require(Predef.scala:145)
> at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:491)
> at kafka.log.Cleaner.clean(LogCleaner.scala:288)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:202)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:187)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1641) Log cleaner exits if last cleaned offset is lower than earliest offset

2014-10-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1641:
-
Assignee: Guozhang Wang
  Status: Patch Available  (was: Open)

> Log cleaner exits if last cleaned offset is lower than earliest offset
> --
>
> Key: KAFKA-1641
> URL: https://issues.apache.org/jira/browse/KAFKA-1641
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-1641.patch
>
>
> Encountered this recently: the log cleaner exited a while ago (I think 
> because the topic had compressed messages). That issue was subsequently 
> addressed by having the producer only send uncompressed. However, on a 
> subsequent restart of the broker we see this:
> In this scenario I think it is reasonable to just emit a warning and have the 
> cleaner round up its first dirty offset to the base offset of the first 
> segment.
> {code}
> [kafka-server] [] [kafka-log-cleaner-thread-0], Error due to 
> java.lang.IllegalArgumentException: requirement failed: Last clean offset is 
> 54770438 but segment base offset is 382844024 for log testtopic-0.
> at scala.Predef$.require(Predef.scala:145)
> at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:491)
> at kafka.log.Cleaner.clean(LogCleaner.scala:288)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:202)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:187)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1641) Log cleaner exits if last cleaned offset is lower than earliest offset

2014-10-06 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161121#comment-14161121
 ] 

Guozhang Wang commented on KAFKA-1641:
--

Created reviewboard https://reviews.apache.org/r/26390/diff/
 against branch origin/trunk

> Log cleaner exits if last cleaned offset is lower than earliest offset
> --
>
> Key: KAFKA-1641
> URL: https://issues.apache.org/jira/browse/KAFKA-1641
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Joel Koshy
> Attachments: KAFKA-1641.patch
>
>
> Encountered this recently: the log cleaner exited a while ago (I think 
> because the topic had compressed messages). That issue was subsequently 
> addressed by having the producer only send uncompressed. However, on a 
> subsequent restart of the broker we see this:
> In this scenario I think it is reasonable to just emit a warning and have the 
> cleaner round up its first dirty offset to the base offset of the first 
> segment.
> {code}
> [kafka-server] [] [kafka-log-cleaner-thread-0], Error due to 
> java.lang.IllegalArgumentException: requirement failed: Last clean offset is 
> 54770438 but segment base offset is 382844024 for log testtopic-0.
> at scala.Predef$.require(Predef.scala:145)
> at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:491)
> at kafka.log.Cleaner.clean(LogCleaner.scala:288)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:202)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:187)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1670) Corrupt log files for segment.bytes values close to Int.MaxInt

2014-10-06 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161167#comment-14161167
 ] 

Gwen Shapira commented on KAFKA-1670:
-

I think we'll want to change the segment size variable to "long" and remove the 
4GB limit (which seems rather low for modern systems).
This may make sense as a separate Jira since [~harsha_ch] patch is necessary in 
any case.


> Corrupt log files for segment.bytes values close to Int.MaxInt
> --
>
> Key: KAFKA-1670
> URL: https://issues.apache.org/jira/browse/KAFKA-1670
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Ryan Berdeen
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1670.patch, KAFKA-1670_2014-10-04_20:17:46.patch, 
> KAFKA-1670_2014-10-06_09:48:25.patch
>
>
> The maximum value for the topic-level config {{segment.bytes}} is 
> {{Int.MaxInt}} (2147483647). *Using this value causes brokers to corrupt 
> their log files, leaving them unreadable.*
> We set {{segment.bytes}} to {{2122317824}} which is well below the maximum. 
> One by one, the ISR of all partitions shrunk to 1. Brokers would crash when 
> restarted, attempting to read from a negative offset in a log file. After 
> discovering that many segment files had grown to 4GB or more, we were forced 
> to shut down our *entire production Kafka cluster* for several hours while we 
> split all segment files into 1GB chunks.
> Looking into the {{kafka.log}} code, the {{segment.bytes}} parameter is used 
> inconsistently. It is treated as a *soft* maximum for the size of the segment 
> file 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/LogConfig.scala#L26)
>  with logs rolled only after 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/Log.scala#L246)
>  they exceed this value. However, much of the code that deals with log files 
> uses *ints* to store the size of the file and the position in the file. 
> Overflow of these ints leads the broker to append to the segments 
> indefinitely, and to fail to read these segments for consuming or recovery.
> This is trivial to reproduce:
> {code}
> $ bin/kafka-topics.sh --topic segment-bytes-test --create 
> --replication-factor 2 --partitions 1 --zookeeper zkhost:2181
> $ bin/kafka-topics.sh --topic segment-bytes-test --alter --config 
> segment.bytes=2147483647 --zookeeper zkhost:2181
> $ yes "Int.MaxValue is a ridiculous bound on file size in 2014" | 
> bin/kafka-console-producer.sh --broker-list localhost:6667 zkhost:2181 
> --topic segment-bytes-test
> {code}
> After running for a few minutes, the log file is corrupt:
> {code}
> $ ls -lh data/segment-bytes-test-0/
> total 9.7G
> -rw-r--r-- 1 root root  10M Oct  3 19:39 .index
> -rw-r--r-- 1 root root 9.7G Oct  3 19:39 .log
> {code}
> We recovered the data from the log files using a simple Python script: 
> https://gist.github.com/also/9f823d9eb9dc0a410796



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1670) Corrupt log files for segment.bytes values close to Int.MaxInt

2014-10-06 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161185#comment-14161185
 ] 

Jay Kreps commented on KAFKA-1670:
--

Actually the reason for limiting the segments to 4GB (or possibly 2GB since we 
are using java ints) is to keep the index pointers into the file limited to 4 
bytes which keeps the index files small (4 byte relative offset and 4 byte file 
position). Index file size and density is important since we hope to keep those 
cached to make lookups cheap. We should fix the variable to avoid overflow and 
even extend to unsigned ints but we probably can't allow arbitrarily large 
segment files very easily.

The reasoning at the time was basically that there is no particular reason to 
want very large segment files, and since we always do recovery from the 
beginning of the file having 10GB segments would cause other problems when you 
crashed and had to do recovery on hundreds of 10GB files.

> Corrupt log files for segment.bytes values close to Int.MaxInt
> --
>
> Key: KAFKA-1670
> URL: https://issues.apache.org/jira/browse/KAFKA-1670
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Ryan Berdeen
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1670.patch, KAFKA-1670_2014-10-04_20:17:46.patch, 
> KAFKA-1670_2014-10-06_09:48:25.patch
>
>
> The maximum value for the topic-level config {{segment.bytes}} is 
> {{Int.MaxInt}} (2147483647). *Using this value causes brokers to corrupt 
> their log files, leaving them unreadable.*
> We set {{segment.bytes}} to {{2122317824}} which is well below the maximum. 
> One by one, the ISR of all partitions shrunk to 1. Brokers would crash when 
> restarted, attempting to read from a negative offset in a log file. After 
> discovering that many segment files had grown to 4GB or more, we were forced 
> to shut down our *entire production Kafka cluster* for several hours while we 
> split all segment files into 1GB chunks.
> Looking into the {{kafka.log}} code, the {{segment.bytes}} parameter is used 
> inconsistently. It is treated as a *soft* maximum for the size of the segment 
> file 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/LogConfig.scala#L26)
>  with logs rolled only after 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/Log.scala#L246)
>  they exceed this value. However, much of the code that deals with log files 
> uses *ints* to store the size of the file and the position in the file. 
> Overflow of these ints leads the broker to append to the segments 
> indefinitely, and to fail to read these segments for consuming or recovery.
> This is trivial to reproduce:
> {code}
> $ bin/kafka-topics.sh --topic segment-bytes-test --create 
> --replication-factor 2 --partitions 1 --zookeeper zkhost:2181
> $ bin/kafka-topics.sh --topic segment-bytes-test --alter --config 
> segment.bytes=2147483647 --zookeeper zkhost:2181
> $ yes "Int.MaxValue is a ridiculous bound on file size in 2014" | 
> bin/kafka-console-producer.sh --broker-list localhost:6667 zkhost:2181 
> --topic segment-bytes-test
> {code}
> After running for a few minutes, the log file is corrupt:
> {code}
> $ ls -lh data/segment-bytes-test-0/
> total 9.7G
> -rw-r--r-- 1 root root  10M Oct  3 19:39 .index
> -rw-r--r-- 1 root root 9.7G Oct  3 19:39 .log
> {code}
> We recovered the data from the log files using a simple Python script: 
> https://gist.github.com/also/9f823d9eb9dc0a410796



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1644) Inherit FetchResponse from RequestOrResponse

2014-10-06 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161187#comment-14161187
 ] 

Jun Rao commented on KAFKA-1644:


Perhaps it's better to throw a UnsupportedOperationExceptions in writeTo()?

> Inherit FetchResponse from RequestOrResponse
> 
>
> Key: KAFKA-1644
> URL: https://issues.apache.org/jira/browse/KAFKA-1644
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Anton Karamanov
>Assignee: Anton Karamanov
> Attachments: 
> 0001-KAFKA-1644-Inherit-FetchResponse-from-RequestOrRespo.patch, 
> 0002-KAFKA-1644-Inherit-FetchResponse-from-RequestOrRespo.patch
>
>
> Unlike all other Kafka API responses {{FetchResponse}} is not a subclass of 
> RequestOrResponse, which requires handling it as a special case while 
> processing responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1677) Governor on concurrent replica reassignments

2014-10-06 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161188#comment-14161188
 ] 

Jay Kreps commented on KAFKA-1677:
--

Agreed, LinkedIn is doing a similar thing. This should definitely be done 
automatically.

> Governor on concurrent replica reassignments
> 
>
> Key: KAFKA-1677
> URL: https://issues.apache.org/jira/browse/KAFKA-1677
> Project: Kafka
>  Issue Type: New Feature
>  Components: controller
>Reporter: Ryan Berdeen
>Assignee: Neha Narkhede
>
> We have seen a cluster be killed via too many concurrent partition transfers. 
> An ideal solution is a configuration setting to limit the number of 
> concurrent transfers per host (dynamically tunable). (eg: transfer_limit 
> defined in 
> http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/#Ring).
> To work around this, we generate our assignments, then use a tool to feed the 
> reassignments in small batches.
> The size of the batch is based on either
> * *the number partitions*, e.g., reassign all replicas for the first 2 
> partitions that have any moves
> * *the number of individual replica moves*, e.g. when reassigning \[1,2,3,4] 
> to \[5,6,7,8], first reassign to \[5,6,3,4] then reassign to \[5,6,7,8]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1674) auto.create.topics.enable docs are misleading

2014-10-06 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1674:
---
Labels: newbie  (was: )

> auto.create.topics.enable docs are misleading
> -
>
> Key: KAFKA-1674
> URL: https://issues.apache.org/jira/browse/KAFKA-1674
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Stevo Slavic
>Priority: Minor
>  Labels: newbie
>
> {{auto.create.topics.enable}} is currently 
> [documented|http://kafka.apache.org/08/configuration.html] with
> {quote}
> Enable auto creation of topic on the server. If this is set to true then 
> attempts to produce, consume, or fetch metadata for a non-existent topic will 
> automatically create it with the default replication factor and number of 
> partitions.
> {quote}
> In Kafka 0.8.1.1 reality, topics are only created when trying to publish a 
> message on non-existing topic.
> After 
> [discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAFbh0Q1WXLUDO-im1fQ1yEvrMduxmXbj5HXVc3Cq8B%3DfeMso9g%40mail.gmail.com%3E]
>  with [~junrao] conclusion was that it's documentation issue which needs to 
> be fixed.
> Before fixing docs, please check once more if this is just non-working 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1468) Improve perf tests

2014-10-06 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-1468.

   Resolution: Fixed
Fix Version/s: 0.8.2
 Assignee: Jay Kreps

Resolving the jira since it's already committed.

> Improve perf tests
> --
>
> Key: KAFKA-1468
> URL: https://issues.apache.org/jira/browse/KAFKA-1468
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Fix For: 0.8.2
>
> Attachments: KAFKA-1468.patch, KAFKA-1468_2014-05-28_16:15:01.patch
>
>
> This is issue is a placeholder for a bunch of improvements that came out of a 
> round of benchmarking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1670) Corrupt log files for segment.bytes values close to Int.MaxInt

2014-10-06 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161207#comment-14161207
 ] 

Gwen Shapira commented on KAFKA-1670:
-

Makes sense. Thanks [~jkreps]

> Corrupt log files for segment.bytes values close to Int.MaxInt
> --
>
> Key: KAFKA-1670
> URL: https://issues.apache.org/jira/browse/KAFKA-1670
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Ryan Berdeen
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1670.patch, KAFKA-1670_2014-10-04_20:17:46.patch, 
> KAFKA-1670_2014-10-06_09:48:25.patch
>
>
> The maximum value for the topic-level config {{segment.bytes}} is 
> {{Int.MaxInt}} (2147483647). *Using this value causes brokers to corrupt 
> their log files, leaving them unreadable.*
> We set {{segment.bytes}} to {{2122317824}} which is well below the maximum. 
> One by one, the ISR of all partitions shrunk to 1. Brokers would crash when 
> restarted, attempting to read from a negative offset in a log file. After 
> discovering that many segment files had grown to 4GB or more, we were forced 
> to shut down our *entire production Kafka cluster* for several hours while we 
> split all segment files into 1GB chunks.
> Looking into the {{kafka.log}} code, the {{segment.bytes}} parameter is used 
> inconsistently. It is treated as a *soft* maximum for the size of the segment 
> file 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/LogConfig.scala#L26)
>  with logs rolled only after 
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/Log.scala#L246)
>  they exceed this value. However, much of the code that deals with log files 
> uses *ints* to store the size of the file and the position in the file. 
> Overflow of these ints leads the broker to append to the segments 
> indefinitely, and to fail to read these segments for consuming or recovery.
> This is trivial to reproduce:
> {code}
> $ bin/kafka-topics.sh --topic segment-bytes-test --create 
> --replication-factor 2 --partitions 1 --zookeeper zkhost:2181
> $ bin/kafka-topics.sh --topic segment-bytes-test --alter --config 
> segment.bytes=2147483647 --zookeeper zkhost:2181
> $ yes "Int.MaxValue is a ridiculous bound on file size in 2014" | 
> bin/kafka-console-producer.sh --broker-list localhost:6667 zkhost:2181 
> --topic segment-bytes-test
> {code}
> After running for a few minutes, the log file is corrupt:
> {code}
> $ ls -lh data/segment-bytes-test-0/
> total 9.7G
> -rw-r--r-- 1 root root  10M Oct  3 19:39 .index
> -rw-r--r-- 1 root root 9.7G Oct  3 19:39 .log
> {code}
> We recovered the data from the log files using a simple Python script: 
> https://gist.github.com/also/9f823d9eb9dc0a410796



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 26393: Follow-up KAFKA-1468

2014-10-06 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26393/
---

Review request for kafka.


Bugs: KAFKA-1468
https://issues.apache.org/jira/browse/KAFKA-1468


Repository: kafka


Description
---

Change array list back to linked list for watchers


Diffs
-

  core/src/main/scala/kafka/server/RequestPurgatory.scala 
cf3ed4c8f197d1197658645ccb55df0bce86bdd4 

Diff: https://reviews.apache.org/r/26393/diff/


Testing
---


Thanks,

Guozhang Wang



[jira] [Commented] (KAFKA-1468) Improve perf tests

2014-10-06 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161211#comment-14161211
 ] 

Guozhang Wang commented on KAFKA-1468:
--

Created reviewboard https://reviews.apache.org/r/26393/diff/
 against branch origin/trunk

> Improve perf tests
> --
>
> Key: KAFKA-1468
> URL: https://issues.apache.org/jira/browse/KAFKA-1468
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Fix For: 0.8.2
>
> Attachments: KAFKA-1468.patch, KAFKA-1468.patch, 
> KAFKA-1468_2014-05-28_16:15:01.patch
>
>
> This is issue is a placeholder for a bunch of improvements that came out of a 
> round of benchmarking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1468) Improve perf tests

2014-10-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1468:
-
Attachment: KAFKA-1468.patch

> Improve perf tests
> --
>
> Key: KAFKA-1468
> URL: https://issues.apache.org/jira/browse/KAFKA-1468
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Fix For: 0.8.2
>
> Attachments: KAFKA-1468.patch, KAFKA-1468.patch, 
> KAFKA-1468_2014-05-28_16:15:01.patch
>
>
> This is issue is a placeholder for a bunch of improvements that came out of a 
> round of benchmarking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1430) Purgatory redesign

2014-10-06 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161218#comment-14161218
 ] 

Guozhang Wang commented on KAFKA-1430:
--

Gone through the change list from May 6th (first patch submitted) to Aug.5th 
(got committed). In the three month period there are 31 commits. I have looked 
though all of them which may have code overlap with this one, and this seems 
the only revert. But it would be great to borrow another pair of eyes.

Moving forward, since we are having more and more simultaneous patch commits I 
would like to suggest that we think about code check in process for large 
patches that may take more than a month to dev / review.

> Purgatory redesign
> --
>
> Key: KAFKA-1430
> URL: https://issues.apache.org/jira/browse/KAFKA-1430
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430.patch, 
> KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430_2014-06-05_17:07:37.patch, 
> KAFKA-1430_2014-06-05_17:13:13.patch, KAFKA-1430_2014-06-05_17:40:53.patch, 
> KAFKA-1430_2014-06-10_11:22:06.patch, KAFKA-1430_2014-06-10_11:26:02.patch, 
> KAFKA-1430_2014-07-11_10:59:13.patch, KAFKA-1430_2014-07-21_12:53:39.patch, 
> KAFKA-1430_2014-07-25_09:52:43.patch, KAFKA-1430_2014-07-28_11:30:23.patch, 
> KAFKA-1430_2014-07-31_15:04:33.patch, KAFKA-1430_2014-08-05_14:54:21.patch
>
>
> We have seen 2 main issues with the Purgatory.
> 1. There is no atomic checkAndWatch functionality. So, a client typically 
> first checks whether a request is satisfied or not and then register the 
> watcher. However, by the time the watcher is registered, the registered item 
> could already be satisfied. This item won't be satisfied until the next 
> update happens or the delayed time expires, which means the watched item 
> could be delayed. 
> 2. FetchRequestPurgatory doesn't quite work. This is because the current 
> design tries to incrementally maintain the accumulated bytes ready for fetch. 
> However, this is difficult since the right time to check whether a fetch (for 
> regular consumer) request is satisfied is when the high watermark moves. At 
> that point, it's hard to figure out how many bytes we should incrementally 
> add to each pending fetch request.
> The problem has been reported in KAFKA-1150 and KAFKA-703.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26346: Patch for KAFKA-1670

2014-10-06 Thread Jun Rao


> On Oct. 6, 2014, 5:24 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/log/Log.scala, line 502
> > 
> >
> > We probably should enforce that a segment is never larger than 
> > config.segmentSize. So, if messageSize is larger than config.segmentSize, 
> > we should just throw an exception. Once we do that, it seems that we can 
> > just use the following check to cover both conditions.
> > 
> > if (segment.size + messageSize > (long) config.segmentSize
> > 
> > We likely need to adjust the comment above accordingly.
> 
> Sriharsha Chintalapani wrote:
> Sorry I am not able follow . can you please elaborate on this " So, if 
> messageSize is larger than config.segmentSize,".
> Here the issue is not the messageSize is larger than the 
> config.segmentSize.
> Currently we only roll when segment.size is greater than the 
> config.segmentSize and the edge case here is
> if the config.segmentSize is Int.MaxValue and the current segment.size is 
> Int.MaxValue - 1 we still wouldn't roll the segment
> and append the current batch to the same segment and next time we check 
> segment.size is overflown will be negative and still fail to pass the check 
> segment.size > config.segmentSize and we keep appending to the same 
> LogSegment.
> 
> if (segment.size + messageSize > (long) config.segmentSize
> This condition wouldn't work since segment.size is Int and if its value 
> is anywhere closer to Int.MaxValue adding the current messages size will 
> cause it overflown.
> 
> we can change the above condition to 
> if (segment.size.toLong + messageSize > config.segmentSize) 
> and changing the comment to 
> LogSegment will be rolled if  segment.size + messagesBatch.size is 
> greater than config.segmentSize. 
> Please let me know if these changes looks good. Thanks.

Yes, my point is that the current implementation of config.segmentSize is a bit 
misleading. It actually allows a log segment to be larger than 
config.segmentSize. It's easier to understand if we disallow that. Also, if we 
never allow a segment to go beyond config.segmentSize, it will never overflow 
since config.segmentSize is capped at 2G. The following check that you 
suggested will suffice.

if (segment.size.toLong + messageSize > config.segmentSize)

Then the issue is what happens when the appended message set itself is larger 
than config.segmentSize. If you just have the above check, you will still allow 
a segment to have a size larger than config.segmentSize. Another issue is that 
if the active segment is empty, we will end up rolling a new segment with the 
same name, which will confuse the broker. In this case, it will be easier to 
reject the message set by throwing an exception.


- Jun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55526
---


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> ---
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
> https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/log/Log.scala 
> 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



Re: Review Request 26373: Patch for KAFKA-1647

2014-10-06 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26373/#review55615
---


Since now the first iteration of "if" statements is only used for logging, 
could we just merge it into the second check?

- Guozhang Wang


On Oct. 6, 2014, 5:06 p.m., Jiangjie Qin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26373/
> ---
> 
> (Updated Oct. 6, 2014, 5:06 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1647
> https://issues.apache.org/jira/browse/KAFKA-1647
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fix for Kafka-1647.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> 78b7514cc109547c562e635824684fad581af653 
> 
> Diff: https://reviews.apache.org/r/26373/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jiangjie Qin
> 
>



[jira] [Updated] (KAFKA-1668) TopicCommand doesn't warn if --topic argument doesn't match any topics

2014-10-06 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-1668:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed, thanks.

> TopicCommand doesn't warn if --topic argument doesn't match any topics
> --
>
> Key: KAFKA-1668
> URL: https://issues.apache.org/jira/browse/KAFKA-1668
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Ryan Berdeen
>Assignee: Manikumar Reddy
>Priority: Minor
>  Labels: newbie
> Attachments: KAFKA-1668.patch
>
>
> Running {{kafka-topics.sh --alter}} with an invalid {{--topic}} argument 
> produces no output and exits with 0, indicating success.
> {code}
> $ bin/kafka-topics.sh --topic does-not-exist --alter --config invalid=xxx 
> --zookeeper zkhost:2181
> $ echo $?
> 0
> {code}
> An invalid topic name or a regular expression that matches 0 topics should at 
> least print a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1468) Improve perf tests

2014-10-06 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161267#comment-14161267
 ] 

Jun Rao commented on KAFKA-1468:


Thanks for the followup patch. +1. Committed to trunk and 0.8.2.

> Improve perf tests
> --
>
> Key: KAFKA-1468
> URL: https://issues.apache.org/jira/browse/KAFKA-1468
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Fix For: 0.8.2
>
> Attachments: KAFKA-1468.patch, KAFKA-1468.patch, 
> KAFKA-1468_2014-05-28_16:15:01.patch
>
>
> This is issue is a placeholder for a bunch of improvements that came out of a 
> round of benchmarking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1430) Purgatory redesign

2014-10-06 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-1430.

Resolution: Fixed

The reverted change is committed in a followup patch in KAFKA-1468.

> Purgatory redesign
> --
>
> Key: KAFKA-1430
> URL: https://issues.apache.org/jira/browse/KAFKA-1430
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430.patch, 
> KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430_2014-06-05_17:07:37.patch, 
> KAFKA-1430_2014-06-05_17:13:13.patch, KAFKA-1430_2014-06-05_17:40:53.patch, 
> KAFKA-1430_2014-06-10_11:22:06.patch, KAFKA-1430_2014-06-10_11:26:02.patch, 
> KAFKA-1430_2014-07-11_10:59:13.patch, KAFKA-1430_2014-07-21_12:53:39.patch, 
> KAFKA-1430_2014-07-25_09:52:43.patch, KAFKA-1430_2014-07-28_11:30:23.patch, 
> KAFKA-1430_2014-07-31_15:04:33.patch, KAFKA-1430_2014-08-05_14:54:21.patch
>
>
> We have seen 2 main issues with the Purgatory.
> 1. There is no atomic checkAndWatch functionality. So, a client typically 
> first checks whether a request is satisfied or not and then register the 
> watcher. However, by the time the watcher is registered, the registered item 
> could already be satisfied. This item won't be satisfied until the next 
> update happens or the delayed time expires, which means the watched item 
> could be delayed. 
> 2. FetchRequestPurgatory doesn't quite work. This is because the current 
> design tries to incrementally maintain the accumulated bytes ready for fetch. 
> However, this is difficult since the right time to check whether a fetch (for 
> regular consumer) request is satisfied is when the high watermark moves. At 
> that point, it's hard to figure out how many bytes we should incrementally 
> add to each pending fetch request.
> The problem has been reported in KAFKA-1150 and KAFKA-703.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26346: Patch for KAFKA-1670

2014-10-06 Thread Jay Kreps

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55619
---



core/src/main/scala/kafka/log/Log.scala


I don't know if this comment is very intuitive. Let's just document that 
messageSize is the size of the message about to be appended and that this 
method will roll the log if either (1) the log is full, (2) the max time has 
elapsed, or (3) the index is full.



core/src/main/scala/kafka/log/Log.scala


It is a bit subtle that you are checking for overflow this way. What we 
mean to check is just that there is sufficient room in the segment for this 
message, which I think we can do by checking:

segment.size > config.segmentSize - messagesSize


- Jay Kreps


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> ---
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
> https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/log/Log.scala 
> 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



[jira] [Created] (KAFKA-1679) JmxTool outputs nothing if any mbean attributes can't be retrieved

2014-10-06 Thread Ryan Berdeen (JIRA)
Ryan Berdeen created KAFKA-1679:
---

 Summary: JmxTool outputs nothing if any mbean attributes can't be 
retrieved
 Key: KAFKA-1679
 URL: https://issues.apache.org/jira/browse/KAFKA-1679
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Ryan Berdeen
Priority: Minor


JmxTool counts the number of attributes for all MBeans and if the number of 
attributes retrieved does not equal this number, nothing is printed.

Several {{java.lang:type=MemoryPool}} MBeans have unsupported attributes (see 
HADOOP-8027, for example), so running JmxTool with no arguments fails to fetch 
these metrics and outputs nothing while continuing to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1667) topic-level configuration not validated

2014-10-06 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161318#comment-14161318
 ] 

Gwen Shapira commented on KAFKA-1667:
-

Looks like LogConfig has very few validations. 

I think it makes sense to refactor it to use either ConfigDef (actually part of 
the client code, but has really nice validators) or at least 
ValidatedProperties instead of just Properties - and then we can add proper 
validation for each config.

Any thoughts on whether ConfigDef can be used in the server, or if we should 
keep on using ValidatedProperties? 

>  topic-level configuration not validated
> 
>
> Key: KAFKA-1667
> URL: https://issues.apache.org/jira/browse/KAFKA-1667
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Ryan Berdeen
>  Labels: newbie
>
> I was able to set the configuration for a topic to these invalid values:
> {code}
> Topic:topic-config-test  PartitionCount:1ReplicationFactor:2 
> Configs:min.cleanable.dirty.ratio=-30.2,segment.bytes=-1,retention.ms=-12,cleanup.policy=lol
> {code}
> It seems that the values are saved as long as they are the correct type, but 
> are not validated like the corresponding broker-level properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Kafka-trunk #290

2014-10-06 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-1672) zk timeouts with examples from 8.1.1.1

2014-10-06 Thread Matthew Sandoz (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161350#comment-14161350
 ] 

Matthew Sandoz commented on KAFKA-1672:
---

ok i reconfigured one of my routers. there was a flag from the AT&T router that 
was set to passthrough instead of "default server" - this was getting some 
packets mistakenly sent in a loop i think. weird because it would work like 
every tenth-twentieth time...

anyway not a bug on your software - and thanks for the timely AND helpful 
guidance - closing ticket as not a bug.

> zk timeouts with examples from 8.1.1.1
> --
>
> Key: KAFKA-1672
> URL: https://issues.apache.org/jira/browse/KAFKA-1672
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, network
>Affects Versions: 0.8.1.1
> Environment: centos 6.4 x64 4gb RAM Java(TM) SE Runtime Environment 
> (build 1.7.0_21-b11)
>Reporter: Matthew Sandoz
>Assignee: Jun Rao
>
> i have two segments in my home network. 192.168.1.X and 192.168.2.X. My 
> servers are on the .1 subnet. Here's what I did.
> Install Kafka on my server on .1 subnet. run included zookeeper, run included 
> kafka server start script. run consumer. run producer.
> as long as thats all i did everything was fine.
> I then installed on a vm my .2 subnet. pointed producer and consumer to the 
> .1 zk and get only timeouts. i can get to each vm from each other vm.
> is there something special i need to do? i've tried playing around with the 
> config files - explicitly setting server.properties host.name to explicitly 
> refer to the server that zk is running on. 
> everything vanilla installed from version 
> kafka/0.8.1.1/kafka_2.9.2-0.8.1.1.tgz file.
> [kafka@vagrant-centos-6 kafka_2.9.2-0.8.1.1]$ bin/kafka-console-consumer.sh 
> --zookeeper chef-server.attlocal.net:2181 --topic test
> consumer properties:
> zookeeper.connect=chef-server.attlocal.net:2181
> zookeeper.connection.timeout.ms=100
> group.id=test-consumer-group
> Exception in thread "main" org.I0Itec.zkclient.exception.ZkTimeoutException: 
> Unable to connect to zookeeper server within timeout: 6000
> at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
> can i provide any other info?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1672) zk timeouts with examples from 8.1.1.1

2014-10-06 Thread Matthew Sandoz (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Sandoz resolved KAFKA-1672.
---
Resolution: Not a Problem
  Reviewer: Matthew Sandoz

> zk timeouts with examples from 8.1.1.1
> --
>
> Key: KAFKA-1672
> URL: https://issues.apache.org/jira/browse/KAFKA-1672
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, network
>Affects Versions: 0.8.1.1
> Environment: centos 6.4 x64 4gb RAM Java(TM) SE Runtime Environment 
> (build 1.7.0_21-b11)
>Reporter: Matthew Sandoz
>Assignee: Jun Rao
>
> i have two segments in my home network. 192.168.1.X and 192.168.2.X. My 
> servers are on the .1 subnet. Here's what I did.
> Install Kafka on my server on .1 subnet. run included zookeeper, run included 
> kafka server start script. run consumer. run producer.
> as long as thats all i did everything was fine.
> I then installed on a vm my .2 subnet. pointed producer and consumer to the 
> .1 zk and get only timeouts. i can get to each vm from each other vm.
> is there something special i need to do? i've tried playing around with the 
> config files - explicitly setting server.properties host.name to explicitly 
> refer to the server that zk is running on. 
> everything vanilla installed from version 
> kafka/0.8.1.1/kafka_2.9.2-0.8.1.1.tgz file.
> [kafka@vagrant-centos-6 kafka_2.9.2-0.8.1.1]$ bin/kafka-console-consumer.sh 
> --zookeeper chef-server.attlocal.net:2181 --topic test
> consumer properties:
> zookeeper.connect=chef-server.attlocal.net:2181
> zookeeper.connection.timeout.ms=100
> group.id=test-consumer-group
> Exception in thread "main" org.I0Itec.zkclient.exception.ZkTimeoutException: 
> Unable to connect to zookeeper server within timeout: 6000
> at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
> can i provide any other info?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1680) JmxTool exits if no arguments are given

2014-10-06 Thread Ryan Berdeen (JIRA)
Ryan Berdeen created KAFKA-1680:
---

 Summary: JmxTool exits if no arguments are given
 Key: KAFKA-1680
 URL: https://issues.apache.org/jira/browse/KAFKA-1680
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Ryan Berdeen
Priority: Minor


JmxTool has no required arguments, but it exits if no arguments are provided. 

You can work around this by passing a non-option argument, which will be 
ignored, e.g.{{./bin/kafka-run-class.sh kafka.tools.JmxTool xxx}}.

It looks like this was broken in KAFKA-1291 / 
6b0ae4bba0d0f8e4c8da19de65a8f03f162bec39



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1621) Standardize --messages option in perf scripts

2014-10-06 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161443#comment-14161443
 ] 

Manikumar Reddy commented on KAFKA-1621:


[~jkreps] would like to work on this issue. Are you referring 
kafka-consumer-perf-test.sh, kafka-producer-perf-test.sh,???,   scripts? 

> Standardize --messages option in perf scripts
> -
>
> Key: KAFKA-1621
> URL: https://issues.apache.org/jira/browse/KAFKA-1621
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Jay Kreps
>  Labels: newbie
>
> This option is specified in PerfConfig and is used by the producer, consumer 
> and simple consumer perf commands. The docstring on the argument does not 
> list it as required but the producer performance test requires it--others 
> don't.
> We should standardize this so that either all the commands require the option 
> and it is marked as required in the docstring or none of them list it as 
> required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1621) Standardize --messages option in perf scripts

2014-10-06 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161443#comment-14161443
 ] 

Manikumar Reddy edited comment on KAFKA-1621 at 10/7/14 4:13 AM:
-

[~jkreps]  I would like to work on this issue. Are you referring 
kafka-consumer-perf-test.sh, kafka-producer-perf-test.sh,???,   scripts? 


was (Author: omkreddy):
[~jkreps] would like to work on this issue. Are you referring 
kafka-consumer-perf-test.sh, kafka-producer-perf-test.sh,???,   scripts? 

> Standardize --messages option in perf scripts
> -
>
> Key: KAFKA-1621
> URL: https://issues.apache.org/jira/browse/KAFKA-1621
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Jay Kreps
>  Labels: newbie
>
> This option is specified in PerfConfig and is used by the producer, consumer 
> and simple consumer perf commands. The docstring on the argument does not 
> list it as required but the producer performance test requires it--others 
> don't.
> We should standardize this so that either all the commands require the option 
> and it is marked as required in the docstring or none of them list it as 
> required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Java New Producer Kafka Trunk ] Need a State Check API Method

2014-10-06 Thread Bhavesh Mistry
I agree with that statement that if producer is closed and try to send
message it will give close.  What we have done is wrap the NEW Producer API
with Old Producer API.  So when I use same code with OLD I do not get this
issue.  It is only problem with NEW Producer.  Regardless of close, state I
think it will be good to have API to check the state of producer (at least
isClosed() API).

If you agree, I can file a Jira Request for STATE check API and let me know
which flavor of State Check API you prefer.


Thanks,

Bhavesh

On Mon, Oct 6, 2014 at 9:34 AM, Jay Kreps  wrote:

> Hey Bhavesh,
>
> This is a sanity check. If you send a message after calling close on the
> producer you should get this error. It sounds like you have multiple
> threads sending, and you close the producer in the middle of this, then you
> get this error. This is expected.
>
> Perhaps I am misunderstanding?
>
> I think tracking the state (i.e. whether you have called close or not) can
> be done just as easily in your code, right?
>
> -Jay
>
> On Sun, Oct 5, 2014 at 7:32 PM, Bhavesh Mistry  >
> wrote:
>
> > Hi Kafka Dev Team,
> >
> > *java.lang.*
> > *IllegalStateException: Cannot send after the producer is closed.*
> >
> > The above seems to bug.  If the ProducerRecord is in flight within send
> > method is execute and another thread seems to shutdown in the middle of
> > flight  will get error.
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Sun, Oct 5, 2014 at 7:15 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com
> > >
> > wrote:
> >
> > > Hi Kafka Dev Team,
> > >
> > > The use case is that we need to know producer state in background
> Threads
> > > and so we can submit the message.
> > >
> > > This seems to a bug in trunk code.  I have notice that KafkaProducer
> > > itself does not have close state and inflight message will encounter
> > > following issues.  Should I file bug for this issue ?
> > >
> > > java.lang.IllegalStateException: Cannot send after the producer is
> > closed.
> > > at
> > >
> >
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:136)
> > > at
> > >
> >
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:237)
> > > .
> > > at java.util.TimerThread.mainLoop(Timer.java:555)
> > > at java.util.TimerThread.run(Timer.java:505)
> > >
> > > Thanks,
> > >
> > > Bhavesh
> > >
> > > On Sun, Oct 5, 2014 at 3:30 PM, Bhavesh Mistry <
> > mistry.p.bhav...@gmail.com
> > > > wrote:
> > >
> > >> HI Kafka Dev,
> > >>
> > >> I would like to request state check state so  I can manage the Life
> > Cycle
> > >> of Producer better.   I you guys agree I will file Jira request.  I
> just
> > >> give state of producer can be I would like mange or start (create new
> > >> instance of producer) or restart or close based on state.   I just
> gave
> > >> example, you may add or remove states.
> > >>
> > >> /***
> > >>
> > >> * API TO CHECK STATE OF PRODUCER
> > >>
> > >> *  @Return
> > >>
> > >>
> > >>
> > >>  STATE.INIT_IN_PROGRESS
> > >>
> > >>  STATE.INIT_DONE
> > >>
> > >>  STATE.RUNNING
> > >>
> > >>  STATE.CLOSE_REQUESTED
> > >>
> > >>  STATE.CLOSE_IN_PROGRESS
> > >>
> > >>  STATE.CLOSED
> > >>
> > >> */
> > >>
> > >> public State getCurrentState();
> > >>
> > >> Thanks,
> > >>
> > >> Bhavesh
> > >>
> > >
> > >
> >
>