[jira] [Updated] (KAFKA-3185) Allow users to cleanup internal data

2016-04-06 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-3185:
-
Labels: developer-experience  (was: )

> Allow users to cleanup internal data
> 
>
> Key: KAFKA-3185
> URL: https://issues.apache.org/jira/browse/KAFKA-3185
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kafka streams
>Reporter: Guozhang Wang
>Priority: Blocker
>  Labels: developer-experience
> Fix For: 0.10.1.0
>
>
> Currently the internal data is managed completely by Kafka Streams framework 
> and users cannot clean them up actively. This results in a bad out-of-the-box 
> user experience especially for running demo programs since it results 
> internal data (changelog topics, RocksDB files, etc) that need to be cleaned 
> manually. It will be better to add a
> {code}
> KafkaStreams.cleanup()
> {code}
> function call to clean up these internal data programmatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3262) Make KafkaStreams debugging friendly

2016-04-06 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-3262:
-
Labels: developer-experience  (was: )

> Make KafkaStreams debugging friendly
> 
>
> Key: KAFKA-3262
> URL: https://issues.apache.org/jira/browse/KAFKA-3262
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kafka streams
>Affects Versions: 0.10.0.0
>Reporter: Yasuhiro Matsuda
>  Labels: developer-experience
> Fix For: 0.10.1.0
>
>
> Current KafkaStreams polls records in the same thread as the data processing 
> thread. This makes debugging user code, as well as KafkaStreams itself, 
> difficult. When the thread is suspended by the debugger, the next heartbeat 
> of the consumer tie to the thread won't be send until the thread is resumed. 
> This often results in missed heartbeats and causes a group rebalance. So it 
> may will be a completely different context then the thread hits the break 
> point the next time.
> We should consider using separate threads for polling and processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3337) Extract selector as a separate groupBy operator for KTable aggregations

2016-04-06 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-3337:
-
Labels: api-change newbie++  (was: newbie++)

> Extract selector as a separate groupBy operator for KTable aggregations
> ---
>
> Key: KAFKA-3337
> URL: https://issues.apache.org/jira/browse/KAFKA-3337
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kafka streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>  Labels: api-change, newbie++
> Fix For: 0.10.1.0
>
>
> Currently KTable aggregation takes a selector used for selecting the 
> aggregate key.and an aggregator for aggregating the values with the same 
> selected key, which makes the function a little bit "heavy":
> {code}
> table.groupBy(initializer, adder, substractor, selector, /* optional serde*/);
> {code}
>  It is better to extract the selector in a separate groupBy function such that
> {code}
> KTableGrouped KTable#groupBy(selector);
> KTable KTableGrouped#aggregate(initializer, adder, substractor, /* optional 
> serde*/);
> {code}
> Note that "KTableGrouped" only have APIs for aggregate and reduce, and none 
> else. So users have to follow the pattern below:
> {code}
> table.groupBy(...).aggregate(...);
> {code}
> This pattern is more natural for users who are familiar with SQL / Pig or 
> Spark DSL, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-02-13 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146250#comment-15146250
 ] 

Neha Narkhede commented on KAFKA-1464:
--

The most useful resource to throttle for is network bandwidth usage by 
replication, as measured by the rate of total outgoing replication data on 
every leader. Adding the ability on every leader to cap data transferred under 
an upper limit is what we are looking for. This can be a config option similar 
to the one we have for the log cleaner. It seems to be that it is better to 
have the leader send less instead of have the replica fetch less as the leader 
has a holistic view of the total amount of data being transferred out.
Data transferred from a leader includes
- Fetch requests from an in-sync replica
- Fetch requests from an out-of-sync replica of a partition being reassigned
- Fetch requests from an out-of-sync replica of a partition not being reassigned
Data transferred across 1+2+3 should stay roughly within the configured upper 
limit. If the limit is crossed, we want to start throttling requests, all 
except the ones that fall under #1. The leader can assign the remaining 
available bandwidth amongst partitions that fall under #2 and #3 by allowing 
more bandwidth to #3 since presumably it is fine to let partitions being 
reassigned to catch up slower than the rest. Throttling could involve returning 
fewer bytes as determined by this computation for each such partition as Jay 
suggests.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.9.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-02-13 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146250#comment-15146250
 ] 

Neha Narkhede edited comment on KAFKA-1464 at 2/13/16 11:36 PM:


The most useful resource to throttle for is network bandwidth usage by 
replication, as measured by the rate of total outgoing replication data on 
every leader. Adding the ability on every leader to cap data transferred under 
an upper limit is what we are looking for. This can be a config option similar 
to the one we have for the log cleaner. It seems to be that it is better to 
have the leader send less instead of have the replica fetch less as the leader 
has a holistic view of the total amount of data being transferred out.
Data transferred from a leader includes
# Fetch requests from an in-sync replica
# Fetch requests from an out-of-sync replica of a partition being reassigned
# Fetch requests from an out-of-sync replica of a partition not being reassigned

Data transferred across 1+2+3 should stay roughly within the configured upper 
limit. If the limit is crossed, we want to start throttling requests, all 
except the ones that fall under #1. The leader can assign the remaining 
available bandwidth amongst partitions that fall under #2 and #3 by allowing 
more bandwidth to #3 since presumably it is fine to let partitions being 
reassigned to catch up slower than the rest. Throttling could involve returning 
fewer bytes as determined by this computation for each such partition as Jay 
suggests.


was (Author: nehanarkhede):
The most useful resource to throttle for is network bandwidth usage by 
replication, as measured by the rate of total outgoing replication data on 
every leader. Adding the ability on every leader to cap data transferred under 
an upper limit is what we are looking for. This can be a config option similar 
to the one we have for the log cleaner. It seems to be that it is better to 
have the leader send less instead of have the replica fetch less as the leader 
has a holistic view of the total amount of data being transferred out.
Data transferred from a leader includes
- Fetch requests from an in-sync replica
- Fetch requests from an out-of-sync replica of a partition being reassigned
- Fetch requests from an out-of-sync replica of a partition not being reassigned
Data transferred across 1+2+3 should stay roughly within the configured upper 
limit. If the limit is crossed, we want to start throttling requests, all 
except the ones that fall under #1. The leader can assign the remaining 
available bandwidth amongst partitions that fall under #2 and #3 by allowing 
more bandwidth to #3 since presumably it is fine to let partitions being 
reassigned to catch up slower than the rest. Throttling could involve returning 
fewer bytes as determined by this computation for each such partition as Jay 
suggests.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.9.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3209) Support single message transforms in Kafka Connect

2016-02-04 Thread Neha Narkhede (JIRA)
Neha Narkhede created KAFKA-3209:


 Summary: Support single message transforms in Kafka Connect
 Key: KAFKA-3209
 URL: https://issues.apache.org/jira/browse/KAFKA-3209
 Project: Kafka
  Issue Type: Improvement
  Components: copycat
Reporter: Neha Narkhede


Users should be able to perform light transformations on messages between a 
connector and Kafka. This is needed because some transformations must be 
performed before the data hits Kafka (e.g. filtering certain types of events or 
PII filtering). It's also useful for very light, single-message modifications 
that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2807) Movement of throughput throttler to common broke upgrade tests

2015-11-11 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000969#comment-15000969
 ] 

Neha Narkhede commented on KAFKA-2807:
--

We don't want our tests to fail, so marking this as a blocker.

> Movement of throughput throttler to common broke upgrade tests
> --
>
> Key: KAFKA-2807
> URL: https://issues.apache.org/jira/browse/KAFKA-2807
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Geoff Anderson
>Priority: Blocker
> Fix For: 0.9.0.0
>
>
> In order to run compatibility tests with an 0.8.2 producer, and using 
> VerifiableProducer, we use the 0.8.2 kafka-run-tools.sh classpath augmented 
> by the 0.9.0 tools and tools dependencies classpaths.
> Recently, some refactoring efforts moved ThroughputThrottler to 
> org.apache.kafka.common.utils package, but this breaks the existing 
> compatibility tests:
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/kafka/common/utils/ThroughputThrottler
> at 
> org.apache.kafka.tools.VerifiableProducer.main(VerifiableProducer.java:334)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.kafka.common.utils.ThroughputThrottler
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 1 more
> {code}
> Given the need to be able to run VerifiableProducer against 0.8.X, I'm not 
> sure VerifiableProducer can depend on org.apache.kafka.common.utils at this 
> point in time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2807) Movement of throughput throttler to common broke upgrade tests

2015-11-11 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2807:
-
Priority: Blocker  (was: Major)

> Movement of throughput throttler to common broke upgrade tests
> --
>
> Key: KAFKA-2807
> URL: https://issues.apache.org/jira/browse/KAFKA-2807
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>Assignee: Geoff Anderson
>Priority: Blocker
> Fix For: 0.9.0.0
>
>
> In order to run compatibility tests with an 0.8.2 producer, and using 
> VerifiableProducer, we use the 0.8.2 kafka-run-tools.sh classpath augmented 
> by the 0.9.0 tools and tools dependencies classpaths.
> Recently, some refactoring efforts moved ThroughputThrottler to 
> org.apache.kafka.common.utils package, but this breaks the existing 
> compatibility tests:
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/kafka/common/utils/ThroughputThrottler
> at 
> org.apache.kafka.tools.VerifiableProducer.main(VerifiableProducer.java:334)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.kafka.common.utils.ThroughputThrottler
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 1 more
> {code}
> Given the need to be able to run VerifiableProducer against 0.8.X, I'm not 
> sure VerifiableProducer can depend on org.apache.kafka.common.utils at this 
> point in time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2517) Performance Regression post SSL implementation

2015-09-08 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2517:
-
Priority: Blocker  (was: Critical)

> Performance Regression post SSL implementation
> --
>
> Key: KAFKA-2517
> URL: https://issues.apache.org/jira/browse/KAFKA-2517
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
>Assignee: Ben Stopford
>Priority: Blocker
> Fix For: 0.8.3
>
>
> It would appear that we incurred a performance regression on submission of 
> the SSL work affecting the performance of the new Kafka Consumer. 
> Running with 1KB messages. Macbook 2.3 GHz Intel Core i7, 8GB, APPLE SSD 
> SM256E. Single server instance. All local. 
> kafka-consumer-perf-test.sh ... --messages 300  --new-consumer
> Pre-SSL changes (commit 503bd36647695e8cc91893ffb80346dd03eb0bc5)
> Steady state throughputs = 234.8 MB/s
> (2861.5913, 234.8261, 3000596, 246233.0543)
> Post-SSL changes (commit 13c432f7952de27e9bf8cb4adb33a91ae3a4b738) 
> Steady state throughput =  178.1 MB/s  
> (2861.5913, 178.1480, 3000596, 186801.7182)
> Implication is a 25% reduction in consumer throughput for these test 
> conditions. 
> This appears to be caused by the use of PlaintextTransportLayer rather than 
> SocketChannel in FileMessageSet.writeTo() meaning a zero copy transfer is not 
> invoked.
> Switching to the use of a SocketChannel directly in FileMessageSet.writeTo()  
> yields the following result:
> Steady state throughput =  281.8 MB/s
> (2861.5913, 281.8191, 3000596, 295508.7650)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2403) Expose offset commit metadata in new consumer

2015-08-22 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708125#comment-14708125
 ] 

Neha Narkhede commented on KAFKA-2403:
--

[~hachikuji] [~ewencp] Cross posting my thoughts on this change here for more 
visibility:
bq. On the other hand, maybe most users don't even specify the offsets manually 
anyway and the concern here is unwarranted since 99% of the cases are handled 
by commit(CommitType) and commit(CommitType, ConsumerCommitCallback)

I think manual offset commit is really a very small percentage of all uses. 
Even though I agree that amongst that minority, fewer would have custom 
metadata, I'm not sure it is worth adding the extra commitWithMetadata API for.

It may be ok in this case to go with 
public void commit(MapTopicPartition, OffsetMetadata offsets, CommitType 
commitType);

 Expose offset commit metadata in new consumer
 -

 Key: KAFKA-2403
 URL: https://issues.apache.org/jira/browse/KAFKA-2403
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jason Gustafson
Assignee: Jason Gustafson
Priority: Minor

 The offset commit protocol supports the ability to add user metadata to 
 commits, but this is not yet exposed in the new consumer API. The 
 straightforward way to add it is to create a container for the offset and 
 metadata and adjust the KafkaConsumer API accordingly.
 {code}
 OffsetMetadata {
   long offset;
   String metadata;
 }
 KafkaConsumer {
   commit(MapTopicPartition, OffsetMetadata)
   OffsetMetadata committed(TopicPartition)
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2367) Add Copycat runtime data API

2015-08-18 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702063#comment-14702063
 ] 

Neha Narkhede commented on KAFKA-2367:
--

I think there are various tradeoffs, as with most choices that a framework is 
presented with :)

The tradeoffs I see are:
1. Agility vs maturity: The maturity argument is that Avro is an advanced 
serialization library that already exists and in spite of having been through 
various compatibility issues, is now well tested and adopted. The agility 
argument against Avro is that for a new framework like Copycat, we might be 
able to move faster (over several releases) by owning and fixing our runtime 
data model, while not waiting for the Avro community to release a patched 
version. This is a problem we struggled with ZkClient, codahale-metrics and 
ZooKeeper on the core Kafka side and though one can argue that the Avro 
community is better, this still remains a concern. The success of the Copycat 
framework depends on its ability to always be the present framework for copying 
data to Kafka and as an early project, agility is key.
2. Cost/time savings vs control: The cost/time saving argument goes for 
adopting Avro even if we really need a very small percentage of it. This does 
save us a little time upfront but the downside is that now we end up having 
Copycat depend on Avro (and all its dependencies). I'm totally in favor of 
using a mature open source library but observing the size of the code we need 
to pull from Avro, I couldn't convince myself of the benefit it presents in 
saving some effort upfront. After all, there will be bugs in either codebase, 
we'd have to find the fastest way to fix those.
3. Generic public interface to encourage connector developers: This is a very 
right-brain argument and a subtle one. I agree with [~jkreps] here. Given 
that our goal should be to attract a large ecosystem of connectors, I would 
want us to remove every bit of pain and friction that would cause connector 
developers to either question our choice of Avro or spend time clarifying its 
impact on them. I understand that in practice this isn't a concern and as long 
we have the right serializers, this will not even be quite so visible but a 
simple SchemaBuilder imported from org.apache.avro can start this discussion 
and distract connector developers who aren't necessarily Avro fans. 

Overall, given the tradeoffs, I'm leaning towards us picking a custom one and 
not depending on all of Avro. 

 Add Copycat runtime data API
 

 Key: KAFKA-2367
 URL: https://issues.apache.org/jira/browse/KAFKA-2367
 Project: Kafka
  Issue Type: Sub-task
  Components: copycat
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
 Fix For: 0.8.3


 Design the API used for runtime data in Copycat. This API is used to 
 construct schemas and records that Copycat processes. This needs to be a 
 fairly general data model (think Avro, JSON, Protobufs, Thrift) in order to 
 support complex, varied data types that may be input from/output to many data 
 systems.
 This should issue should also address the serialization interfaces used 
 within Copycat, which translate the runtime data into serialized byte[] form. 
 It is important that these be considered together because the data format can 
 be used in multiple ways (records, partition IDs, partition offsets), so it 
 and the corresponding serializers must be sufficient for all these use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2365) Copycat checklist

2015-07-27 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643087#comment-14643087
 ] 

Neha Narkhede commented on KAFKA-2365:
--

Worth discussing a process for including a connector in Kafka core, but I think 
we went through this in the KIP discussion and here is what I think. To keep 
packaging, review and code management easier, it is better to just include a 
couple lightweight connectors enough to show the usage of the copypcat APIs 
(file in/file out). Any connector that requires depending on an external system 
(HDFS, Elasticsearch) should really live elsewhere. We should also delete the 
ones under contrib, they never ended up getting supported by the community. 

Since there will always be connectors that need to live outside Kafka, I think 
we should instead focus on how to make tooling easier for discovering and using 
these federated connectors. 

 Copycat checklist
 -

 Key: KAFKA-2365
 URL: https://issues.apache.org/jira/browse/KAFKA-2365
 Project: Kafka
  Issue Type: New Feature
  Components: copycat
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
  Labels: feature
 Fix For: 0.8.3


 This covers the development plan for 
 [KIP-26|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767].
  There are a number of features that can be developed in sequence to make 
 incremental progress, and often in parallel:
 * Initial patch - connector API and core implementation
 * Runtime data API
 * Standalone CLI
 * REST API
 * Distributed copycat - CLI
 * Distributed copycat - coordinator
 * Distributed copycat - config storage
 * Distributed copycat - offset storage
 * Log/file connector (sample source/sink connector)
 * Elasticsearch sink connector (sample sink connector for full log - Kafka 
 - Elasticsearch sample pipeline)
 * Copycat metrics
 * System tests (including connector tests)
 * Mirrormaker connector
 * Copycat documentation
 This is an initial list, but it might need refinement to allow for more 
 incremental progress and may be missing features we find we want before the 
 initial release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2350) Add KafkaConsumer pause capability

2015-07-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642229#comment-14642229
 ] 

Neha Narkhede commented on KAFKA-2350:
--

Few thoughts:
bq. 1. Add pause/unpause
bq. 2. Allow subscribe(topic) followed by unsubscribe(partition) to subscribe 
to a topic but suppress a partition
bq. 3. Either of the above but making the use of group management explicit 
using an enable.group.management flag

I'm in favor of 1. for several reasons:
1. It keeps the API semantics clean. subscribe/unsubscribe indicates intent to 
consume data, while pause/resume indicates a *temporary* preference for the 
purposes of flow control.
2. It avoids all the different permutations of subscribe/unsubscribe that we 
will need to worry about and each one of those would have to make sense and be 
explained clearly to the user. This discussion is confusing enough that I'm 
convinced that it will not be easy.
3. pause/resume moves the consumer to a different state in its state diagram. 
Overloading the same API to represent two different states is unintuitive.

Also +1 on - 
1. Renaming unpause to resume.
2. Not maintaining the pause/resume preference across consumer rebalances.

There may be complications in the implementation of the above preferences that 
I may have overlooked, but I feel we should design APIs for the right behavior 
and figure out the implementation related issues that might come up as a 
result. 


 Add KafkaConsumer pause capability
 --

 Key: KAFKA-2350
 URL: https://issues.apache.org/jira/browse/KAFKA-2350
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson

 There are some use cases in stream processing where it is helpful to be able 
 to pause consumption of a topic. For example, when joining two topics, you 
 may need to delay processing of one topic while you wait for the consumer of 
 the other topic to catch up. The new consumer currently doesn't provide a 
 nice way to do this. If you skip poll() or if you unsubscribe, then a 
 rebalance will be triggered and your partitions will be reassigned.
 One way to achieve this would be to add two new methods to KafkaConsumer:
 {code}
 void pause(String... topics);
 void unpause(String... topics);
 {code}
 When a topic is paused, a call to KafkaConsumer.poll will not initiate any 
 new fetches for that topic. After it is unpaused, fetches will begin again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2350) Add KafkaConsumer pause capability

2015-07-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642229#comment-14642229
 ] 

Neha Narkhede edited comment on KAFKA-2350 at 7/27/15 3:55 AM:
---

Few thoughts:
bq. 1. Add pause/unpause
bq. 2. Allow subscribe(topic) followed by unsubscribe(partition) to subscribe 
to a topic but suppress a partition
bq. 3. Either of the above but making the use of group management explicit 
using an enable.group.management flag

I'm in favor of 1. for several reasons:
1. It keeps the API semantics clean. subscribe/unsubscribe indicates intent to 
consume data, while pause/resume indicates a *temporary* preference for the 
purposes of flow control.
2. It avoids all the different permutations of subscribe/unsubscribe that we 
will need to worry about and each one of those would have to make sense and be 
explained clearly to the user. This discussion is confusing enough that I'm 
convinced that it will not be easy.
3. pause/resume moves the consumer to a different state in its state diagram. 
Overloading the same API to represent two different states is unintuitive.

Also +1 on - 
1. Renaming unpause to resume.
2. Not maintaining the pause/resume preference across consumer rebalances.

Also not in favor of adding the enable.group.management config. I agree with 
Jay that adding the config will just complicate the semantics and reduce 
operational simplicity, increasing the number of ways the API calls made by the 
user would not behave as expected. 

There may be complications in the implementation of the above preferences that 
I may have overlooked, but I feel we should design APIs for the right behavior 
and figure out the implementation related issues that might come up as a 
result. 


was (Author: nehanarkhede):
Few thoughts:
bq. 1. Add pause/unpause
bq. 2. Allow subscribe(topic) followed by unsubscribe(partition) to subscribe 
to a topic but suppress a partition
bq. 3. Either of the above but making the use of group management explicit 
using an enable.group.management flag

I'm in favor of 1. for several reasons:
1. It keeps the API semantics clean. subscribe/unsubscribe indicates intent to 
consume data, while pause/resume indicates a *temporary* preference for the 
purposes of flow control.
2. It avoids all the different permutations of subscribe/unsubscribe that we 
will need to worry about and each one of those would have to make sense and be 
explained clearly to the user. This discussion is confusing enough that I'm 
convinced that it will not be easy.
3. pause/resume moves the consumer to a different state in its state diagram. 
Overloading the same API to represent two different states is unintuitive.

Also +1 on - 
1. Renaming unpause to resume.
2. Not maintaining the pause/resume preference across consumer rebalances.

There may be complications in the implementation of the above preferences that 
I may have overlooked, but I feel we should design APIs for the right behavior 
and figure out the implementation related issues that might come up as a 
result. 


 Add KafkaConsumer pause capability
 --

 Key: KAFKA-2350
 URL: https://issues.apache.org/jira/browse/KAFKA-2350
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson

 There are some use cases in stream processing where it is helpful to be able 
 to pause consumption of a topic. For example, when joining two topics, you 
 may need to delay processing of one topic while you wait for the consumer of 
 the other topic to catch up. The new consumer currently doesn't provide a 
 nice way to do this. If you skip poll() or if you unsubscribe, then a 
 rebalance will be triggered and your partitions will be reassigned.
 One way to achieve this would be to add two new methods to KafkaConsumer:
 {code}
 void pause(String... topics);
 void unpause(String... topics);
 {code}
 When a topic is paused, a call to KafkaConsumer.poll will not initiate any 
 new fetches for that topic. After it is unpaused, fetches will begin again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2311) Consumer's ensureNotClosed method not thread safe

2015-07-14 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2311:
-
Reviewer: Neha Narkhede

 Consumer's ensureNotClosed method not thread safe
 -

 Key: KAFKA-2311
 URL: https://issues.apache.org/jira/browse/KAFKA-2311
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Tim Brooks
Assignee: Tim Brooks
 Attachments: KAFKA-2311.patch, KAFKA-2311.patch


 When a call is to the consumer is made, the first check is to see that the 
 consumer is not closed. This variable is not volatile so there is no 
 guarantee previous stores will be visible before a read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2321) Introduce CONTRIBUTING.md

2015-07-14 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2321:
-
Reviewer: Neha Narkhede

 Introduce CONTRIBUTING.md
 -

 Key: KAFKA-2321
 URL: https://issues.apache.org/jira/browse/KAFKA-2321
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
Assignee: Ismael Juma
 Attachments: KAFKA-2321.patch


 This file is displayed when people create a pull request in GitHub. It should 
 link to the relevant pages in the wiki and website.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2323) Simplify ScalaTest dependency versions

2015-07-14 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2323:
-
Reviewer: Jun Rao

 Simplify ScalaTest dependency versions
 --

 Key: KAFKA-2323
 URL: https://issues.apache.org/jira/browse/KAFKA-2323
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
Assignee: Ismael Juma
Priority: Minor
 Attachments: KAFKA-2323.patch


 We currently use the following ScalaTest versions:
 * 1.8 for Scala 2.9.x
 * 1.9.1 for Scala 2.10.x
 * 2.2.0 for Scala 2.11.x
 I propose we simplify it to:
 * 1.9.1 for Scala 2.9.x
 * 2.2.5 for every other Scala version (currently 2.10.x and 2.11.x)
 And since we will drop support for Scala 2.9.x soon, then the conditional 
 check for ScalaTest can be removed altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2236) offset request reply racing with segment rolling

2015-06-02 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2236:
-
Assignee: Jason Gustafson

 offset request reply racing with segment rolling
 

 Key: KAFKA-2236
 URL: https://issues.apache.org/jira/browse/KAFKA-2236
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2.0
 Environment: Linux x86_64, java.1.7.0_72, discovered using librdkafka 
 based client.
Reporter: Alfred Landrum
Assignee: Jason Gustafson
Priority: Critical
  Labels: newbie

 My use case with kafka involves an aggressive retention policy that rolls 
 segment files frequently. My librdkafka based client sees occasional errors 
 to offset requests, showing up in the broker log like:
 [2015-06-02 02:33:38,047] INFO Rolled new log segment for 
 'receiver-93b40462-3850-47c1-bcda-8a3e221328ca-50' in 1 ms. (kafka.log.Log)
 [2015-06-02 02:33:38,049] WARN [KafkaApi-0] Error while responding to offset 
 request (kafka.server.KafkaApis)
 java.lang.ArrayIndexOutOfBoundsException: 3
 at kafka.server.KafkaApis.fetchOffsetsBefore(KafkaApis.scala:469)
 at kafka.server.KafkaApis.fetchOffsets(KafkaApis.scala:449)
 at kafka.server.KafkaApis$$anonfun$17.apply(KafkaApis.scala:411)
 at kafka.server.KafkaApis$$anonfun$17.apply(KafkaApis.scala:402)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at kafka.server.KafkaApis.handleOffsetRequest(KafkaApis.scala:402)
 at kafka.server.KafkaApis.handle(KafkaApis.scala:61)
 at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
 at java.lang.Thread.run(Thread.java:745)
 quoting Guozhang Wang's reply to my query on the users list:
 I check the 0.8.2 code and may probably find a bug related to your issue.
 Basically, segsArray.last.size is called multiple times during handling
 offset requests, while segsArray.last could get concurrent appends. Hence
 it is possible that in line 461, if(segsArray.last.size  0) returns false
 while later in line 468, if(segsArray.last.size  0) could return true.
 http://mail-archives.apache.org/mod_mbox/kafka-users/201506.mbox/%3CCAHwHRrUK-3wdoEAaFbsD0E859Ea0gXixfxgDzF8E3%3D_8r7K%2Bpw%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2187) Introduce merge-kafka-pr.py script

2015-06-02 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569499#comment-14569499
 ] 

Neha Narkhede commented on KAFKA-2187:
--

[~ijuma] As I went through the patch again, I caught a few places that still 
refer to Spark instead of Kafka. If you can fix those, I'll check it in. 

 Introduce merge-kafka-pr.py script
 --

 Key: KAFKA-2187
 URL: https://issues.apache.org/jira/browse/KAFKA-2187
 Project: Kafka
  Issue Type: New Feature
Reporter: Ismael Juma
Assignee: Ismael Juma
Priority: Minor
 Attachments: KAFKA-2187.patch, KAFKA-2187_2015-05-20_23:14:05.patch


 This script will be used to merge GitHub pull requests and it will pull from 
 the Apache Git repo to the current branch, squash and merge the PR, push the 
 commit to trunk, close the PR (via commit message) and close the relevant 
 JIRA issue (via JIRA API).
 Spark has a script that does most (if not all) of this and that will be used 
 as the starting point:
 https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2231) Deleting a topic fails

2015-05-31 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566889#comment-14566889
 ] 

Neha Narkhede commented on KAFKA-2231:
--

[~JGH] Thanks for reporting the issue. Would you mind also trying to reproduce 
this on trunk?

 Deleting a topic fails
 --

 Key: KAFKA-2231
 URL: https://issues.apache.org/jira/browse/KAFKA-2231
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.1
 Environment: Windows 8.1
Reporter: James G. Haberly
Priority: Minor

 delete.topic.enable=true is in config\server.properties.
 Using --list shows the topic marked for deletion.
 Stopping and restarting kafka and zookeeper does not delete the topic; it 
 remains marked for deletion.
 Trying to recreate the topic fails with Topic XXX already exists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2233) Log deletion is not removing log metrics

2015-05-31 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2233:
-
Labels: newbie  (was: )

 Log deletion is not removing log metrics
 

 Key: KAFKA-2233
 URL: https://issues.apache.org/jira/browse/KAFKA-2233
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.2.1
Reporter: Stevo Slavic
Assignee: Jay Kreps
Priority: Minor
  Labels: newbie

 Topic deletion does not remove associated metrics. Any configured kafka 
 metric reporter that gets triggered after a topic is deleted, when polling 
 for log metrics for such deleted logs it will throw something like:
 {noformat}
 java.util.NoSuchElementException
 at 
 java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2299)
 at 
 java.util.concurrent.ConcurrentSkipListMap$ValueIterator.next(ConcurrentSkipListMap.java:2326)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
 at scala.collection.IterableLike$class.head(IterableLike.scala:107)
 at scala.collection.AbstractIterable.head(Iterable.scala:54)
 at kafka.log.Log.logStartOffset(Log.scala:502)
 at kafka.log.Log$$anon$2.value(Log.scala:86)
 at kafka.log.Log$$anon$2.value(Log.scala:85)
 {noformat}
 since on log deletion, {{Log}} segments collection get cleared, so 
 logSegments {{Iterable}} has no (next) elements.
 Known workaround is to restart broker - as metric registry is in memory, not 
 persisted, on restart it will be recreated with metrics for 
 existing/non-deleted topics only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2187) Introduce merge-kafka-pr.py script

2015-05-31 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2187:
-
Reviewer: Neha Narkhede

 Introduce merge-kafka-pr.py script
 --

 Key: KAFKA-2187
 URL: https://issues.apache.org/jira/browse/KAFKA-2187
 Project: Kafka
  Issue Type: New Feature
Reporter: Ismael Juma
Assignee: Ismael Juma
Priority: Minor
 Attachments: KAFKA-2187.patch, KAFKA-2187_2015-05-20_23:14:05.patch


 This script will be used to merge GitHub pull requests and it will pull from 
 the Apache Git repo to the current branch, squash and merge the PR, push the 
 commit to trunk, close the PR (via commit message) and close the relevant 
 JIRA issue (via JIRA API).
 Spark has a script that does most (if not all) of this and that will be used 
 as the starting point:
 https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2187) Introduce merge-kafka-pr.py script

2015-05-31 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566884#comment-14566884
 ] 

Neha Narkhede commented on KAFKA-2187:
--

[~ijuma] This looks great. I can check it in, would you mind updating the wiki 
too 
https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review ?


 Introduce merge-kafka-pr.py script
 --

 Key: KAFKA-2187
 URL: https://issues.apache.org/jira/browse/KAFKA-2187
 Project: Kafka
  Issue Type: New Feature
Reporter: Ismael Juma
Assignee: Ismael Juma
Priority: Minor
 Attachments: KAFKA-2187.patch, KAFKA-2187_2015-05-20_23:14:05.patch


 This script will be used to merge GitHub pull requests and it will pull from 
 the Apache Git repo to the current branch, squash and merge the PR, push the 
 commit to trunk, close the PR (via commit message) and close the relevant 
 JIRA issue (via JIRA API).
 Spark has a script that does most (if not all) of this and that will be used 
 as the starting point:
 https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely

2015-05-27 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562061#comment-14562061
 ] 

Neha Narkhede commented on KAFKA-2168:
--

There are tradeoffs to having multiple threads per consumer instance vs having 
a consumer instance per thread. The consumer code is simpler in the latter 
design, the throughput is better but the # of TCP connections are fewer in the 
former design. Some of the concerns [~ewencp] brings up above can be mitigated 
if there is a separate consumer instance per user thread and others can be 
mitigated by the user picking the right timeout on poll() that they are 
comfortable blocking on. All of this would mean explicitly stating that the 
consumer APIs are not threadsafe and that the user should create multiple 
consumer instances across threads instead of sharing one. We still need to make 
sure close() can be called from a separate thread as [~ewencp] correctly points 
out, though the change isn't large if we go down this route. 

It seems like it is simpler to stick to the original intention of the design 
and not share consumer instances across threads? 

 New consumer poll() can block other calls like position(), commit(), and 
 close() indefinitely
 -

 Key: KAFKA-2168
 URL: https://issues.apache.org/jira/browse/KAFKA-2168
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Reporter: Ewen Cheslack-Postava
Assignee: Jason Gustafson

 The new consumer is currently using very coarse-grained synchronization. For 
 most methods this isn't a problem since they finish quickly once the lock is 
 acquired, but poll() might run for a long time (and commonly will since 
 polling with long timeouts is a normal use case). This means any operations 
 invoked from another thread may block until the poll() call completes.
 Some example use cases where this can be a problem:
 * A shutdown hook is registered to trigger shutdown and invokes close(). It 
 gets invoked from another thread and blocks indefinitely.
 * User wants to manage offset commit themselves in a background thread. If 
 the commit policy is not purely time based, it's not currently possibly to 
 make sure the call to commit() will be processed promptly.
 Two possible solutions to this:
 1. Make sure a lock is not held during the actual select call. Since we have 
 multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) 
 this is probably hard to make work cleanly since locking is currently only 
 performed at the KafkaConsumer level and we'd want it unlocked around a 
 single line of code in Selector.
 2. Wake up the selector before synchronizing for certain operations. This 
 would require some additional coordination to make sure the caller of 
 wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() 
 thread being woken up and then promptly reacquiring the lock with a 
 subsequent long poll() call).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2197) Controller not able to update state for broker on the same machine

2015-05-15 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545751#comment-14545751
 ] 

Neha Narkhede commented on KAFKA-2197:
--

[~bedwards] Unassigning myself as I probably wouldn't get around to this, but 
you might have better visibility getting to the root cause on the mailing list 
(if you already haven't)

 Controller not able to update state for broker on the same machine
 --

 Key: KAFKA-2197
 URL: https://issues.apache.org/jira/browse/KAFKA-2197
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.8.2.1
 Environment: docker 1.5, 64 bit Linux (4.0.1-1).
Reporter: Ben Edwards
 Attachments: docker-compose-stripped.yml


 I am using kafka on docker. When I try to create a topic the controller seems 
 to get stuck and the topic is never usable for consumers or producers.
 {noformat}
 [2015-05-15 15:51:10,201] WARN [Controller-9092-to-broker-9092-send-thread], 
 Controller 9092 epoch 1 fails to send request 
 Name:LeaderAndIsrRequest;Version:0;Controller:9092;ControllerEpoch:1;CorrelationId:7;ClientId:id_9092-host_null-port_9092;Leaders:id:9092,host:kafka,port:9092;PartitionState:(lol,0)
  - 
 (LeaderAndIsrInfo:(Leader:9092,ISR:9092,LeaderEpoch:0,ControllerEpoch:1),ReplicationFactor:1),AllReplicas:9092)
  to broker id:9092,host:kafka,port:9092. Reconnecting to broker. 
 (kafka.controller.RequestSendThread)
 java.nio.channels.ClosedChannelException
   at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
   at 
 kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132)
   at 
 kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 {noformat}
 Repro steps:
 run docker-compose up with the attached docker-compose yaml file.
 enter the kafka container (get the name with {{docker ps}}, {{docker exec -it 
 name bash}} to enter).
 run the following 
 {noformat}
 cd /opt/kafka_2.10-0.8.2.1/
 ./bin/kafka-topics.sh --list --zookeeper zk:2181
 ./bin/kafka-topics.sh --create --zookeeper zk:2181 --topic lol 
 --replicatior-factor 1 --partitions 1
 ./bin/kafka-topics.sh --create --zookeeper zk:2181 --topic testing 
 --replication-factor 1 --partitions 1
 ./bin/kafka-topics.sh --list --zookeeper zk:2181
 tail -f logs/controller.log 
 {noformat}
 This should allow you to observe the controller being upset. The zookeeper 
 instance is definitely reachable, the hostnames are correct as far as I can 
 tell. I am kind of at a loss as to what is happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2197) Controller not able to update state for broker on the same machine

2015-05-15 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2197:
-
Assignee: (was: Neha Narkhede)

 Controller not able to update state for broker on the same machine
 --

 Key: KAFKA-2197
 URL: https://issues.apache.org/jira/browse/KAFKA-2197
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.8.2.1
 Environment: docker 1.5, 64 bit Linux (4.0.1-1).
Reporter: Ben Edwards
 Attachments: docker-compose-stripped.yml


 I am using kafka on docker. When I try to create a topic the controller seems 
 to get stuck and the topic is never usable for consumers or producers.
 {noformat}
 [2015-05-15 15:51:10,201] WARN [Controller-9092-to-broker-9092-send-thread], 
 Controller 9092 epoch 1 fails to send request 
 Name:LeaderAndIsrRequest;Version:0;Controller:9092;ControllerEpoch:1;CorrelationId:7;ClientId:id_9092-host_null-port_9092;Leaders:id:9092,host:kafka,port:9092;PartitionState:(lol,0)
  - 
 (LeaderAndIsrInfo:(Leader:9092,ISR:9092,LeaderEpoch:0,ControllerEpoch:1),ReplicationFactor:1),AllReplicas:9092)
  to broker id:9092,host:kafka,port:9092. Reconnecting to broker. 
 (kafka.controller.RequestSendThread)
 java.nio.channels.ClosedChannelException
   at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
   at 
 kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132)
   at 
 kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 {noformat}
 Repro steps:
 run docker-compose up with the attached docker-compose yaml file.
 enter the kafka container (get the name with {{docker ps}}, {{docker exec -it 
 name bash}} to enter).
 run the following 
 {noformat}
 cd /opt/kafka_2.10-0.8.2.1/
 ./bin/kafka-topics.sh --list --zookeeper zk:2181
 ./bin/kafka-topics.sh --create --zookeeper zk:2181 --topic lol 
 --replicatior-factor 1 --partitions 1
 ./bin/kafka-topics.sh --create --zookeeper zk:2181 --topic testing 
 --replication-factor 1 --partitions 1
 ./bin/kafka-topics.sh --list --zookeeper zk:2181
 tail -f logs/controller.log 
 {noformat}
 This should allow you to observe the controller being upset. The zookeeper 
 instance is definitely reachable, the hostnames are correct as far as I can 
 tell. I am kind of at a loss as to what is happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2166) Recreation breaks topic-list

2015-05-04 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526860#comment-14526860
 ] 

Neha Narkhede commented on KAFKA-2166:
--

[~Alien2150] Thanks for reporting this issue. Which Kafka version do you see 
this behavior on? Have you tried this on trunk?

 Recreation breaks topic-list
 

 Key: KAFKA-2166
 URL: https://issues.apache.org/jira/browse/KAFKA-2166
 Project: Kafka
  Issue Type: Bug
Reporter: Thomas Zimmer

 Hi here are the steps the reproduce the issue:
 * Create a topic called test
 * Delete the topic test
 * Recreate the topic test 
 What will happen is that you will see the topic in the topic-list but it's 
 marked as deleted:
  ./kafka-topics.sh --list --zookeeper zookpeer1.dev, zookeeper2.dev
 test - marked for deletion
 Is there a way to fix it without having to delete everything? We also tried 
 several restarts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-824) java.lang.NullPointerException in commitOffsets

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-824:

Affects Version/s: 0.8.2.0
 Assignee: (was: Neha Narkhede)
   Labels: newbie  (was: )

 java.lang.NullPointerException in commitOffsets 
 

 Key: KAFKA-824
 URL: https://issues.apache.org/jira/browse/KAFKA-824
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.7.2, 0.8.2.0
Reporter: Yonghui Zhao
  Labels: newbie
 Attachments: ZkClient.0.3.txt, ZkClient.0.4.txt, screenshot-1.jpg


 Neha Narkhede
 Yes, I have. Unfortunately, I never quite around to fixing it. My guess is
 that it is caused due to a race condition between the rebalance thread and
 the offset commit thread when a rebalance is triggered or the client is
 being shutdown. Do you mind filing a bug ?
 2013/03/25 12:08:32.020 WARN [ZookeeperConsumerConnector] [] 
 0_lu-ml-test10.bj-1364184411339-7c88f710 exception during commitOffsets
 java.lang.NullPointerException
 at org.I0Itec.zkclient.ZkConnection.writeData(ZkConnection.java:111)
 at org.I0Itec.zkclient.ZkClient$10.call(ZkClient.java:813)
 at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
 at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:809)
 at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:777)
 at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
 at 
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
 at 
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
 at scala.collection.Iterator$class.foreach(Iterator.scala:631)
 at 
 scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:549)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
 at 
 scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:570)
 at 
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:248)
 at 
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:246)
 at scala.collection.Iterator$class.foreach(Iterator.scala:631)
 at kafka.utils.Pool$$anon$1.foreach(Pool.scala:53)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
 at kafka.utils.Pool.foreach(Pool.scala:24)
 at 
 kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
 at 
 kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
 at 
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126)
 at kafka.utils.Utils$$anon$2.run(Utils.scala:58)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2169) Upgrade to zkclient-0.5

2015-05-04 Thread Neha Narkhede (JIRA)
Neha Narkhede created KAFKA-2169:


 Summary: Upgrade to zkclient-0.5
 Key: KAFKA-2169
 URL: https://issues.apache.org/jira/browse/KAFKA-2169
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.0
Reporter: Neha Narkhede
Priority: Critical


zkclient-0.5 is released 
http://mvnrepository.com/artifact/com.101tec/zkclient/0.5 and has the fix for 
KAFKA-824



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2159) offsets.topic.segment.bytes and offsets.topic.retention.minutes are ignored

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2159:
-
Labels: newbie  (was: )

 offsets.topic.segment.bytes and offsets.topic.retention.minutes are ignored
 ---

 Key: KAFKA-2159
 URL: https://issues.apache.org/jira/browse/KAFKA-2159
 Project: Kafka
  Issue Type: Bug
  Components: offset manager
Affects Versions: 0.8.2.1
Reporter: Rafał Boniecki
  Labels: newbie

 My broker configuration:
 {quote}offsets.topic.num.partitions=20
 offsets.topic.segment.bytes=10485760
 offsets.topic.retention.minutes=10080{quote}
 Describe of __consumer_offsets topic:
 {quote}Topic:__consumer_offsets   PartitionCount:20   
 ReplicationFactor:3 Configs:segment.bytes=104857600,cleanup.policy=compact
   Topic: __consumer_offsets   Partition: 0Leader: 112 
 Replicas: 112,212,312   Isr: 212,312,112
   Topic: __consumer_offsets   Partition: 1Leader: 212 
 Replicas: 212,312,412   Isr: 212,312,412
   Topic: __consumer_offsets   Partition: 2Leader: 312 
 Replicas: 312,412,512   Isr: 312,412,512
   Topic: __consumer_offsets   Partition: 3Leader: 412 
 Replicas: 412,512,112   Isr: 412,512,112
   Topic: __consumer_offsets   Partition: 4Leader: 512 
 Replicas: 512,112,212   Isr: 512,212,112
   Topic: __consumer_offsets   Partition: 5Leader: 112 
 Replicas: 112,312,412   Isr: 312,412,112
   Topic: __consumer_offsets   Partition: 6Leader: 212 
 Replicas: 212,412,512   Isr: 212,412,512
   Topic: __consumer_offsets   Partition: 7Leader: 312 
 Replicas: 312,512,112   Isr: 312,512,112
   Topic: __consumer_offsets   Partition: 8Leader: 412 
 Replicas: 412,112,212   Isr: 412,212,112
   Topic: __consumer_offsets   Partition: 9Leader: 512 
 Replicas: 512,212,312   Isr: 512,212,312
   Topic: __consumer_offsets   Partition: 10   Leader: 112 
 Replicas: 112,412,512   Isr: 412,512,112
   Topic: __consumer_offsets   Partition: 11   Leader: 212 
 Replicas: 212,512,112   Isr: 212,512,112
   Topic: __consumer_offsets   Partition: 12   Leader: 312 
 Replicas: 312,112,212   Isr: 312,212,112
   Topic: __consumer_offsets   Partition: 13   Leader: 412 
 Replicas: 412,212,312   Isr: 412,212,312
   Topic: __consumer_offsets   Partition: 14   Leader: 512 
 Replicas: 512,312,412   Isr: 512,312,412
   Topic: __consumer_offsets   Partition: 15   Leader: 112 
 Replicas: 112,512,212   Isr: 512,212,112
   Topic: __consumer_offsets   Partition: 16   Leader: 212 
 Replicas: 212,112,312   Isr: 212,312,112
   Topic: __consumer_offsets   Partition: 17   Leader: 312 
 Replicas: 312,212,412   Isr: 312,212,412
   Topic: __consumer_offsets   Partition: 18   Leader: 412 
 Replicas: 412,312,512   Isr: 412,312,512
   Topic: __consumer_offsets   Partition: 19   Leader: 512 
 Replicas: 512,412,112   Isr: 512,412,112{quote}
 OffsetManager logs:
 {quote}2015-04-29 17:58:43:403 CEST DEBUG 
 [kafka-scheduler-3][kafka.server.OffsetManager] Compacting offsets cache.
 2015-04-29 17:58:43:403 CEST DEBUG 
 [kafka-scheduler-3][kafka.server.OffsetManager] Found 1 stale offsets (older 
 than 8640 ms).
 2015-04-29 17:58:43:404 CEST TRACE 
 [kafka-scheduler-3][kafka.server.OffsetManager] Removing stale offset and 
 metadata for [drafts,tasks,1]: OffsetAndMetadata[824,consumer_id = drafts, 
 time = 1430322433,0]
 2015-04-29 17:58:43:404 CEST TRACE 
 [kafka-scheduler-3][kafka.server.OffsetManager] Marked 1 offsets in 
 [__consumer_offsets,2] for deletion.
 2015-04-29 17:58:43:404 CEST DEBUG 
 [kafka-scheduler-3][kafka.server.OffsetManager] Removed 1 stale offsets in 1 
 milliseconds.{quote}
 Parameters are ignored and default values are used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2159) offsets.topic.segment.bytes and offsets.topic.retention.minutes are ignored

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2159:
-
Component/s: offset manager

 offsets.topic.segment.bytes and offsets.topic.retention.minutes are ignored
 ---

 Key: KAFKA-2159
 URL: https://issues.apache.org/jira/browse/KAFKA-2159
 Project: Kafka
  Issue Type: Bug
  Components: offset manager
Affects Versions: 0.8.2.1
Reporter: Rafał Boniecki
  Labels: newbie

 My broker configuration:
 {quote}offsets.topic.num.partitions=20
 offsets.topic.segment.bytes=10485760
 offsets.topic.retention.minutes=10080{quote}
 Describe of __consumer_offsets topic:
 {quote}Topic:__consumer_offsets   PartitionCount:20   
 ReplicationFactor:3 Configs:segment.bytes=104857600,cleanup.policy=compact
   Topic: __consumer_offsets   Partition: 0Leader: 112 
 Replicas: 112,212,312   Isr: 212,312,112
   Topic: __consumer_offsets   Partition: 1Leader: 212 
 Replicas: 212,312,412   Isr: 212,312,412
   Topic: __consumer_offsets   Partition: 2Leader: 312 
 Replicas: 312,412,512   Isr: 312,412,512
   Topic: __consumer_offsets   Partition: 3Leader: 412 
 Replicas: 412,512,112   Isr: 412,512,112
   Topic: __consumer_offsets   Partition: 4Leader: 512 
 Replicas: 512,112,212   Isr: 512,212,112
   Topic: __consumer_offsets   Partition: 5Leader: 112 
 Replicas: 112,312,412   Isr: 312,412,112
   Topic: __consumer_offsets   Partition: 6Leader: 212 
 Replicas: 212,412,512   Isr: 212,412,512
   Topic: __consumer_offsets   Partition: 7Leader: 312 
 Replicas: 312,512,112   Isr: 312,512,112
   Topic: __consumer_offsets   Partition: 8Leader: 412 
 Replicas: 412,112,212   Isr: 412,212,112
   Topic: __consumer_offsets   Partition: 9Leader: 512 
 Replicas: 512,212,312   Isr: 512,212,312
   Topic: __consumer_offsets   Partition: 10   Leader: 112 
 Replicas: 112,412,512   Isr: 412,512,112
   Topic: __consumer_offsets   Partition: 11   Leader: 212 
 Replicas: 212,512,112   Isr: 212,512,112
   Topic: __consumer_offsets   Partition: 12   Leader: 312 
 Replicas: 312,112,212   Isr: 312,212,112
   Topic: __consumer_offsets   Partition: 13   Leader: 412 
 Replicas: 412,212,312   Isr: 412,212,312
   Topic: __consumer_offsets   Partition: 14   Leader: 512 
 Replicas: 512,312,412   Isr: 512,312,412
   Topic: __consumer_offsets   Partition: 15   Leader: 112 
 Replicas: 112,512,212   Isr: 512,212,112
   Topic: __consumer_offsets   Partition: 16   Leader: 212 
 Replicas: 212,112,312   Isr: 212,312,112
   Topic: __consumer_offsets   Partition: 17   Leader: 312 
 Replicas: 312,212,412   Isr: 312,212,412
   Topic: __consumer_offsets   Partition: 18   Leader: 412 
 Replicas: 412,312,512   Isr: 412,312,512
   Topic: __consumer_offsets   Partition: 19   Leader: 512 
 Replicas: 512,412,112   Isr: 512,412,112{quote}
 OffsetManager logs:
 {quote}2015-04-29 17:58:43:403 CEST DEBUG 
 [kafka-scheduler-3][kafka.server.OffsetManager] Compacting offsets cache.
 2015-04-29 17:58:43:403 CEST DEBUG 
 [kafka-scheduler-3][kafka.server.OffsetManager] Found 1 stale offsets (older 
 than 8640 ms).
 2015-04-29 17:58:43:404 CEST TRACE 
 [kafka-scheduler-3][kafka.server.OffsetManager] Removing stale offset and 
 metadata for [drafts,tasks,1]: OffsetAndMetadata[824,consumer_id = drafts, 
 time = 1430322433,0]
 2015-04-29 17:58:43:404 CEST TRACE 
 [kafka-scheduler-3][kafka.server.OffsetManager] Marked 1 offsets in 
 [__consumer_offsets,2] for deletion.
 2015-04-29 17:58:43:404 CEST DEBUG 
 [kafka-scheduler-3][kafka.server.OffsetManager] Removed 1 stale offsets in 1 
 milliseconds.{quote}
 Parameters are ignored and default values are used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2156) Possibility to plug in custom MetricRegistry

2015-05-04 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526932#comment-14526932
 ] 

Neha Narkhede commented on KAFKA-2156:
--

[~sandris] KM = Kafka Metrics (org.apache.kafka.common.metrics) :)
Worth considering supporting this directly on Kafka Metrics.

 Possibility to plug in custom MetricRegistry
 

 Key: KAFKA-2156
 URL: https://issues.apache.org/jira/browse/KAFKA-2156
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Affects Versions: 0.8.1.2
Reporter: Andras Sereny
Assignee: Jun Rao

 The trait KafkaMetricsGroup refers to Metrics.defaultRegistry() throughout. 
 It would be nice to be able to inject any MetricsRegistry instead of the 
 default one. 
 (My usecase is that I'd like to channel Kafka metrics into our application's 
 metrics system, for which I'd need custom implementations of 
 com.yammer.metrics.core.Metric.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time

2015-05-04 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526945#comment-14526945
 ] 

Neha Narkhede edited comment on KAFKA-1387 at 5/4/15 5:58 PM:
--

When this happens, there isn't a way to get out of this without killing the 
broker. Marking it as a blocker.


was (Author: nehanarkhede):
When this happens, there isn't a way to get out of this without killing the 
broker. 

 Kafka getting stuck creating ephemeral node it has already created when two 
 zookeeper sessions are established in a very short period of time
 -

 Key: KAFKA-1387
 URL: https://issues.apache.org/jira/browse/KAFKA-1387
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Fedor Korotkiy
Priority: Blocker
  Labels: newbie, patch
 Attachments: kafka-1387.patch


 Kafka broker re-registers itself in zookeeper every time handleNewSession() 
 callback is invoked.
 https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
  
 Now imagine the following sequence of events.
 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
 the zkClient, but not invoked yet.
 2) Zookeeper session reestablishes again, queueing callback second time.
 3) First callback is invoked, creating /broker/[id] ephemeral path.
 4) Second callback is invoked and it tries to create /broker/[id] path using 
 createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
 already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
 stuck in the infinite loop.
 Seems like controller election code have the same issue.
 I'am able to reproduce this issue on the 0.8.1 branch from github using the 
 following configs.
 # zookeeper
 tickTime=10
 dataDir=/tmp/zk/
 clientPort=2101
 maxClientCnxns=0
 # kafka
 broker.id=1
 log.dir=/tmp/kafka
 zookeeper.connect=localhost:2101
 zookeeper.connection.timeout.ms=100
 zookeeper.sessiontimeout.ms=100
 Just start kafka and zookeeper and then pause zookeeper several times using 
 Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1387:
-
Priority: Blocker  (was: Major)

When this happens, there isn't a way to get out of this without killing the 
broker. 

 Kafka getting stuck creating ephemeral node it has already created when two 
 zookeeper sessions are established in a very short period of time
 -

 Key: KAFKA-1387
 URL: https://issues.apache.org/jira/browse/KAFKA-1387
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Fedor Korotkiy
Priority: Blocker
  Labels: newbie, patch
 Attachments: kafka-1387.patch


 Kafka broker re-registers itself in zookeeper every time handleNewSession() 
 callback is invoked.
 https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
  
 Now imagine the following sequence of events.
 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
 the zkClient, but not invoked yet.
 2) Zookeeper session reestablishes again, queueing callback second time.
 3) First callback is invoked, creating /broker/[id] ephemeral path.
 4) Second callback is invoked and it tries to create /broker/[id] path using 
 createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
 already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
 stuck in the infinite loop.
 Seems like controller election code have the same issue.
 I'am able to reproduce this issue on the 0.8.1 branch from github using the 
 following configs.
 # zookeeper
 tickTime=10
 dataDir=/tmp/zk/
 clientPort=2101
 maxClientCnxns=0
 # kafka
 broker.id=1
 log.dir=/tmp/kafka
 zookeeper.connect=localhost:2101
 zookeeper.connection.timeout.ms=100
 zookeeper.sessiontimeout.ms=100
 Just start kafka and zookeeper and then pause zookeeper several times using 
 Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1387:
-
Labels: newbie patch zkclient-problems  (was: newbie patch)

 Kafka getting stuck creating ephemeral node it has already created when two 
 zookeeper sessions are established in a very short period of time
 -

 Key: KAFKA-1387
 URL: https://issues.apache.org/jira/browse/KAFKA-1387
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Fedor Korotkiy
Priority: Blocker
  Labels: newbie, patch, zkclient-problems
 Attachments: kafka-1387.patch


 Kafka broker re-registers itself in zookeeper every time handleNewSession() 
 callback is invoked.
 https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
  
 Now imagine the following sequence of events.
 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
 the zkClient, but not invoked yet.
 2) Zookeeper session reestablishes again, queueing callback second time.
 3) First callback is invoked, creating /broker/[id] ephemeral path.
 4) Second callback is invoked and it tries to create /broker/[id] path using 
 createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
 already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
 stuck in the infinite loop.
 Seems like controller election code have the same issue.
 I'am able to reproduce this issue on the 0.8.1 branch from github using the 
 following configs.
 # zookeeper
 tickTime=10
 dataDir=/tmp/zk/
 clientPort=2101
 maxClientCnxns=0
 # kafka
 broker.id=1
 log.dir=/tmp/kafka
 zookeeper.connect=localhost:2101
 zookeeper.connection.timeout.ms=100
 zookeeper.sessiontimeout.ms=100
 Just start kafka and zookeeper and then pause zookeeper several times using 
 Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2157) kafka-console-consumer.sh: Mismatch in CLI usage docs vs. Scala Option parsing

2015-05-04 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526972#comment-14526972
 ] 

Neha Narkhede commented on KAFKA-2157:
--

[~tvaughan77] topics is correct. Patch is appreciated :)

 kafka-console-consumer.sh: Mismatch in CLI usage docs vs. Scala Option 
 parsing
 

 Key: KAFKA-2157
 URL: https://issues.apache.org/jira/browse/KAFKA-2157
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Thomas Vaughan
Priority: Minor

 I built kafka-0.8.2.1 from source and noticed there's a mismatch between the 
 command line usage help and the actual arguments expected by the 
 ConsumerPerformace$ConsumerPerfConfig.class file.
 On the command line if you run bin/kafka-console-consumer.sh with no 
 arguments it claims you need these required fields:
 {code}
 --broker-list hostname:port,..,REQUIRED: broker info (the list of
  
   hostname:port  broker host and port for bootstrap.
 ...
 --topics topic1,topic2..  REQUIRED: The comma separated list of 
  
   topics to produce to  
 {code}
 Supplying that script with those flags will cause 
 joptsimple.OptionException.unrecognizedOption exceptions because what's 
 really needed are \--zookeeper and \--topic (singular, not plural) 
 options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2155) Add option to control ZK root for kafka.tools.ConsumerOffsetChecker

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-2155.
--
Resolution: Not A Problem

 Add option to control ZK root for kafka.tools.ConsumerOffsetChecker
 ---

 Key: KAFKA-2155
 URL: https://issues.apache.org/jira/browse/KAFKA-2155
 Project: Kafka
  Issue Type: Improvement
  Components: admin
Affects Versions: 0.8.1.2
 Environment: Hortonworks 2.2.4 with Kafka 0.8.1.2.2
Reporter: Kjell Tore Fossbakk
Priority: Minor
  Labels: features

 Hello.
 We need to add an option to kafka.tools.ConsumerOffsetChecker which will 
 allow the control of ZK root. It is at the moment hardcoded to /consumers/.
 A new option --zkroot would simply replace the contents of /consumers, 
 which defaultsTo(/consumers).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2155) Add option to control ZK root for kafka.tools.ConsumerOffsetChecker

2015-05-04 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526978#comment-14526978
 ] 

Neha Narkhede commented on KAFKA-2155:
--

You can just specify the zk root as part of the ZK connection string. 
{code}
./bin/kafka-consumer-offset-checker.sh --zookeeper localhost:2181/namespace
{code}

 Add option to control ZK root for kafka.tools.ConsumerOffsetChecker
 ---

 Key: KAFKA-2155
 URL: https://issues.apache.org/jira/browse/KAFKA-2155
 Project: Kafka
  Issue Type: Improvement
  Components: admin
Affects Versions: 0.8.1.2
 Environment: Hortonworks 2.2.4 with Kafka 0.8.1.2.2
Reporter: Kjell Tore Fossbakk
Priority: Minor
  Labels: features

 Hello.
 We need to add an option to kafka.tools.ConsumerOffsetChecker which will 
 allow the control of ZK root. It is at the moment hardcoded to /consumers/.
 A new option --zkroot would simply replace the contents of /consumers, 
 which defaultsTo(/consumers).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1843) Metadata fetch/refresh in new producer should handle all node connection states gracefully

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1843:
-
Labels: newbie++  (was: )

 Metadata fetch/refresh in new producer should handle all node connection 
 states gracefully
 --

 Key: KAFKA-1843
 URL: https://issues.apache.org/jira/browse/KAFKA-1843
 Project: Kafka
  Issue Type: Bug
  Components: clients, producer 
Affects Versions: 0.8.2.0
Reporter: Ewen Cheslack-Postava
Priority: Blocker
  Labels: newbie++

 KAFKA-1642 resolved some issues with the handling of broker connection states 
 to avoid high CPU usage, but made the minimal fix rather than the ideal one. 
 The code for handling the metadata fetch is difficult to get right because it 
 has to handle a lot of possible connectivity states and failure modes across 
 all the known nodes. It also needs to correctly integrate with the 
 surrounding event loop, providing correct poll() timeouts to both avoid busy 
 looping and make sure it wakes up and tries new nodes in the face of both 
 connection and request failures.
 A patch here should address a few issues:
 1. Make sure connection timeouts, as implemented in KAFKA-1842, are cleanly 
 integrated. This mostly means that when a connecting node is selected to 
 fetch metadata from, that the code notices that and sets the next timeout 
 based on the connection timeout rather than some other backoff.
 2. Rethink the logic and naming of NetworkClient.leastLoadedNode. That method 
 actually takes into account a) the current connectivity of each node, b) 
 whether the node had a recent connection failure, c) the load in terms of 
 in flight requests. It also needs to ensure that different clients don't use 
 the same ordering across multiple calls (which is already addressed in the 
 current code by nodeIndexOffset) and that we always eventually try all nodes 
 in the face of connection failures (which isn't currently handled by 
 leastLoadedNode and probably cannot be without tracking additional state). 
 This method also has to work for new consumer use cases even though it is 
 currently only used by the new producer's metadata fetch. Finally it has to 
 properly handle when other code calls initiateConnect() since the normal path 
 for sending messages also initiates connections.
 We can already say that there is an order of preference given a single call 
 (as follows), but making this work across multiple calls when some initial 
 choices fail to connect or return metadata *and* connection states may be 
 changing is much more difficult.
  * Connected, zero in flight requests - the request can be sent immediately
  * Connecting node - it will hopefully be connected very soon and by 
 definition has no in flight requests
  * Disconnected - same reasoning as for a connecting node
  * Connected,  0 in flight requests - we consider any # of in flight 
 requests as a big enough backlog to delay the request a lot.
 We could use an approach that better accounts for # of in flight requests 
 rather than just turning it into a boolean variable, but that probably 
 introduces much more complexity than it is worth.
 3. The most difficult case to handle so far has been when leastLoadedNode 
 returns a disconnected node to maybeUpdateMetadata as its best option. 
 Properly handling the two resulting cases (initiateConnect fails immediately 
 vs. taking some time to possibly establish the connection) is tricky.
 4. Consider optimizing for the failure cases. The most common cases are when 
 you already have an active connection and can immediately get the metadata or 
 you need to establish a connection, but the connection and metadata 
 request/response happen very quickly. These common cases are infrequent 
 enough (default every 5 min) that establishing an extra connection isn't a 
 big deal as long as it's eventually cleaned up. The edge cases, like network 
 partitions where some subset of nodes become unreachable for a long period, 
 are harder to reason about but we should be sure we will always be able to 
 gracefully recover from them.
 KAFKA-1642 enumerated the possible outcomes of a single call to 
 maybeUpdateMetadata. A good fix for this would consider all of those outcomes 
 for repeated calls to 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1843) Metadata fetch/refresh in new producer should handle all node connection states gracefully

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1843:
-
Priority: Blocker  (was: Major)

 Metadata fetch/refresh in new producer should handle all node connection 
 states gracefully
 --

 Key: KAFKA-1843
 URL: https://issues.apache.org/jira/browse/KAFKA-1843
 Project: Kafka
  Issue Type: Bug
  Components: clients, producer 
Affects Versions: 0.8.2.0
Reporter: Ewen Cheslack-Postava
Priority: Blocker
  Labels: newbie++

 KAFKA-1642 resolved some issues with the handling of broker connection states 
 to avoid high CPU usage, but made the minimal fix rather than the ideal one. 
 The code for handling the metadata fetch is difficult to get right because it 
 has to handle a lot of possible connectivity states and failure modes across 
 all the known nodes. It also needs to correctly integrate with the 
 surrounding event loop, providing correct poll() timeouts to both avoid busy 
 looping and make sure it wakes up and tries new nodes in the face of both 
 connection and request failures.
 A patch here should address a few issues:
 1. Make sure connection timeouts, as implemented in KAFKA-1842, are cleanly 
 integrated. This mostly means that when a connecting node is selected to 
 fetch metadata from, that the code notices that and sets the next timeout 
 based on the connection timeout rather than some other backoff.
 2. Rethink the logic and naming of NetworkClient.leastLoadedNode. That method 
 actually takes into account a) the current connectivity of each node, b) 
 whether the node had a recent connection failure, c) the load in terms of 
 in flight requests. It also needs to ensure that different clients don't use 
 the same ordering across multiple calls (which is already addressed in the 
 current code by nodeIndexOffset) and that we always eventually try all nodes 
 in the face of connection failures (which isn't currently handled by 
 leastLoadedNode and probably cannot be without tracking additional state). 
 This method also has to work for new consumer use cases even though it is 
 currently only used by the new producer's metadata fetch. Finally it has to 
 properly handle when other code calls initiateConnect() since the normal path 
 for sending messages also initiates connections.
 We can already say that there is an order of preference given a single call 
 (as follows), but making this work across multiple calls when some initial 
 choices fail to connect or return metadata *and* connection states may be 
 changing is much more difficult.
  * Connected, zero in flight requests - the request can be sent immediately
  * Connecting node - it will hopefully be connected very soon and by 
 definition has no in flight requests
  * Disconnected - same reasoning as for a connecting node
  * Connected,  0 in flight requests - we consider any # of in flight 
 requests as a big enough backlog to delay the request a lot.
 We could use an approach that better accounts for # of in flight requests 
 rather than just turning it into a boolean variable, but that probably 
 introduces much more complexity than it is worth.
 3. The most difficult case to handle so far has been when leastLoadedNode 
 returns a disconnected node to maybeUpdateMetadata as its best option. 
 Properly handling the two resulting cases (initiateConnect fails immediately 
 vs. taking some time to possibly establish the connection) is tricky.
 4. Consider optimizing for the failure cases. The most common cases are when 
 you already have an active connection and can immediately get the metadata or 
 you need to establish a connection, but the connection and metadata 
 request/response happen very quickly. These common cases are infrequent 
 enough (default every 5 min) that establishing an extra connection isn't a 
 big deal as long as it's eventually cleaned up. The edge cases, like network 
 partitions where some subset of nodes become unreachable for a long period, 
 are harder to reason about but we should be sure we will always be able to 
 gracefully recover from them.
 KAFKA-1642 enumerated the possible outcomes of a single call to 
 maybeUpdateMetadata. A good fix for this would consider all of those outcomes 
 for repeated calls to 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2139) Add a separate controller messge queue with higher priority on broker side

2015-05-04 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527048#comment-14527048
 ] 

Neha Narkhede commented on KAFKA-2139:
--

[~becket_qin] Left some comments regarding the design on the wiki. 

 Add a separate controller messge queue with higher priority on broker side 
 ---

 Key: KAFKA-2139
 URL: https://issues.apache.org/jira/browse/KAFKA-2139
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin

 This ticket is supposed to be working together with KAFKA-2029. 
 There are two issues with current controller to broker messages.
 1. On the controller side the message are sent without synchronization.
 2. On broker side the controller messages share the same queue as client 
 messages.
 The problem here is that brokers process the controller messages for the same 
 partition at different times and the variation could be big. This causes 
 unnecessary data loss and prolong the preferred leader election / controlled 
 shutdown/ partition reassignment, etc.
 KAFKA-2029 was trying to add a boundary between messages for different 
 partitions. For example, before leader migration for previous partition 
 finishes, the leader migration for next partition won't begin.
 This ticket is trying to let broker process controller messages faster. So 
 the idea is have separate queue to hold controller messages, if there are 
 controller messages, KafkaApi thread will first take care of those messages, 
 otherwise it will proceed messages from clients.
 Those two tickets are not ultimate solution to current controller problems, 
 but just mitigate them with minor code changes. Moving forward, we still need 
 to think about rewriting controller in a cleaner way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1843) Metadata fetch/refresh in new producer should handle all node connection states gracefully

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1843:
-
Labels:   (was: newbie++)

 Metadata fetch/refresh in new producer should handle all node connection 
 states gracefully
 --

 Key: KAFKA-1843
 URL: https://issues.apache.org/jira/browse/KAFKA-1843
 Project: Kafka
  Issue Type: Bug
  Components: clients, producer 
Affects Versions: 0.8.2.0
Reporter: Ewen Cheslack-Postava
Priority: Blocker

 KAFKA-1642 resolved some issues with the handling of broker connection states 
 to avoid high CPU usage, but made the minimal fix rather than the ideal one. 
 The code for handling the metadata fetch is difficult to get right because it 
 has to handle a lot of possible connectivity states and failure modes across 
 all the known nodes. It also needs to correctly integrate with the 
 surrounding event loop, providing correct poll() timeouts to both avoid busy 
 looping and make sure it wakes up and tries new nodes in the face of both 
 connection and request failures.
 A patch here should address a few issues:
 1. Make sure connection timeouts, as implemented in KAFKA-1842, are cleanly 
 integrated. This mostly means that when a connecting node is selected to 
 fetch metadata from, that the code notices that and sets the next timeout 
 based on the connection timeout rather than some other backoff.
 2. Rethink the logic and naming of NetworkClient.leastLoadedNode. That method 
 actually takes into account a) the current connectivity of each node, b) 
 whether the node had a recent connection failure, c) the load in terms of 
 in flight requests. It also needs to ensure that different clients don't use 
 the same ordering across multiple calls (which is already addressed in the 
 current code by nodeIndexOffset) and that we always eventually try all nodes 
 in the face of connection failures (which isn't currently handled by 
 leastLoadedNode and probably cannot be without tracking additional state). 
 This method also has to work for new consumer use cases even though it is 
 currently only used by the new producer's metadata fetch. Finally it has to 
 properly handle when other code calls initiateConnect() since the normal path 
 for sending messages also initiates connections.
 We can already say that there is an order of preference given a single call 
 (as follows), but making this work across multiple calls when some initial 
 choices fail to connect or return metadata *and* connection states may be 
 changing is much more difficult.
  * Connected, zero in flight requests - the request can be sent immediately
  * Connecting node - it will hopefully be connected very soon and by 
 definition has no in flight requests
  * Disconnected - same reasoning as for a connecting node
  * Connected,  0 in flight requests - we consider any # of in flight 
 requests as a big enough backlog to delay the request a lot.
 We could use an approach that better accounts for # of in flight requests 
 rather than just turning it into a boolean variable, but that probably 
 introduces much more complexity than it is worth.
 3. The most difficult case to handle so far has been when leastLoadedNode 
 returns a disconnected node to maybeUpdateMetadata as its best option. 
 Properly handling the two resulting cases (initiateConnect fails immediately 
 vs. taking some time to possibly establish the connection) is tricky.
 4. Consider optimizing for the failure cases. The most common cases are when 
 you already have an active connection and can immediately get the metadata or 
 you need to establish a connection, but the connection and metadata 
 request/response happen very quickly. These common cases are infrequent 
 enough (default every 5 min) that establishing an extra connection isn't a 
 big deal as long as it's eventually cleaned up. The edge cases, like network 
 partitions where some subset of nodes become unreachable for a long period, 
 are harder to reason about but we should be sure we will always be able to 
 gracefully recover from them.
 KAFKA-1642 enumerated the possible outcomes of a single call to 
 maybeUpdateMetadata. A good fix for this would consider all of those outcomes 
 for repeated calls to 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1886) SimpleConsumer swallowing ClosedByInterruptException

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-1886.
--
Resolution: Fixed

Thanks, pushed to trunk

 SimpleConsumer swallowing ClosedByInterruptException
 

 Key: KAFKA-1886
 URL: https://issues.apache.org/jira/browse/KAFKA-1886
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Reporter: Aditya A Auradkar
Assignee: Aditya Auradkar
 Attachments: KAFKA-1886.patch, KAFKA-1886.patch, 
 KAFKA-1886_2015-02-02_13:57:23.patch, KAFKA-1886_2015-04-28_10:27:39.patch


 This issue was originally reported by a Samza developer. I've included an 
 exchange of mine with Chris Riccomini. I'm trying to reproduce the problem on 
 my dev setup.
 From: criccomi
 Hey all,
 Samza's BrokerProxy [1] threads appear to be wedging randomly when we try to 
 interrupt its fetcher thread. I noticed that SimpleConsumer.scala catches 
 Throwable in its sendRequest method [2]. I'm wondering: if 
 blockingChannel.send/receive throws a ClosedByInterruptException
 when the thread is interrupted, what happens? It looks like sendRequest will 
 catch the exception (which I
 think clears the thread's interrupted flag), and then retries the send. If 
 the send succeeds on the retry, I think that the ClosedByInterruptException 
 exception is effectively swallowed, and the BrokerProxy will continue
 fetching messages as though its thread was never interrupted.
 Am I misunderstanding how things work?
 Cheers,
 Chris
 [1] 
 https://github.com/apache/incubator-samza/blob/master/samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala#L126
 [2] 
 https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L75



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2158) Close all fetchers in AbstractFetcherManager without blocking

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2158:
-
Reviewer: Jun Rao

 Close all fetchers in AbstractFetcherManager without blocking
 -

 Key: KAFKA-2158
 URL: https://issues.apache.org/jira/browse/KAFKA-2158
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 0.8.2.0
Reporter: Jiasheng Wang
 Attachments: KAFKA-2158.patch


 def closeAllFetchers() {
 mapLock synchronized {
   for ( (_, fetcher) - fetcherThreadMap) {
 fetcher.shutdown()
   }
   fetcherThreadMap.clear()
 }
   }
 It is time consuming for closeAllFetchers() in AbstractFetcherManager.scala 
 because each time a fetcher calls shutdown method it will block until 
 awaitShutdown() returns. As a result it will slow down the restart of kafka 
 service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2153) kafka-patch-review tool uploads a patch even if it is empty

2015-05-04 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2153:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks, pushed to trunk.

 kafka-patch-review tool uploads a patch even if it is empty
 ---

 Key: KAFKA-2153
 URL: https://issues.apache.org/jira/browse/KAFKA-2153
 Project: Kafka
  Issue Type: Bug
Reporter: Ashish K Singh
Assignee: Ashish K Singh
 Attachments: KAFKA-2153.patch


 kafka-patch-review tool is great and a big help. However, sometimes one 
 forgets to commit the changes made and runs this tool. The tool ends up 
 uploading an empty patch. It will be nice to catch and intimate the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-313) Add JSON/CSV output and looping options to ConsumerGroupCommand

2015-05-04 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527075#comment-14527075
 ] 

Neha Narkhede commented on KAFKA-313:
-

[~singhashish] I'm sorry for the delay. Not sure how this slipped through the 
list that [~junrao] is maintaining for patches under review. We really need our 
review-nag script :)
This just crossed my mind. Since the last time you uploaded this patch, we 
started the KIP Review process for all user-facing changes. My guess is that 
changes to user-facing tools are included in that. Would you mind starting a 
quick KIP discussion on this, it should be a quick vote. I can help merge your 
patch right after (we might need a rebase)

 Add JSON/CSV output and looping options to ConsumerGroupCommand
 ---

 Key: KAFKA-313
 URL: https://issues.apache.org/jira/browse/KAFKA-313
 Project: Kafka
  Issue Type: Improvement
Reporter: Dave DeMaagd
Assignee: Ashish K Singh
Priority: Minor
  Labels: newbie, patch
 Fix For: 0.8.3

 Attachments: KAFKA-313-2012032200.diff, KAFKA-313.1.patch, 
 KAFKA-313.patch, KAFKA-313_2015-02-23_18:11:32.patch


 Adds:
 * '--loop N' - causes the program to loop forever, sleeping for up to N 
 seconds between loops (loop time minus collection time, unless that's less 
 than 0, at which point it will just run again immediately)
 * '--asjson' - display as a JSON string instead of the more human readable 
 output format.
 Neither of the above  depend on each other (you can loop in the human 
 readable output, or do a single shot execution with JSON output).  Existing 
 behavior/output maintained if neither of the above are used.  Diff Attached.
 Impacted files:
 core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1621) Standardize --messages option in perf scripts

2015-04-28 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1621:
-
Resolution: Fixed
  Assignee: Rekha Joshi
Status: Resolved  (was: Patch Available)

Merged PR #58

 Standardize --messages option in perf scripts
 -

 Key: KAFKA-1621
 URL: https://issues.apache.org/jira/browse/KAFKA-1621
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Jay Kreps
Assignee: Rekha Joshi
  Labels: newbie

 This option is specified in PerfConfig and is used by the producer, consumer 
 and simple consumer perf commands. The docstring on the argument does not 
 list it as required but the producer performance test requires it--others 
 don't.
 We should standardize this so that either all the commands require the option 
 and it is marked as required in the docstring or none of them list it as 
 required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2015-04-27 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516298#comment-14516298
 ] 

Neha Narkhede commented on KAFKA-1367:
--

[~jjkoshy] That would work too but looks like [~abiletskyi] is suggesting that 
it is not included as part of KIP-4. Maybe we can have whoever picks this JIRA 
discuss this change as part of a separate KIP?

 Broker topic metadata not kept in sync with ZooKeeper
 -

 Key: KAFKA-1367
 URL: https://issues.apache.org/jira/browse/KAFKA-1367
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Ryan Berdeen
Assignee: Ashish K Singh
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1367.txt


 When a broker is restarted, the topic metadata responses from the brokers 
 will be incorrect (different from ZooKeeper) until a preferred replica leader 
 election.
 In the metadata, it looks like leaders are correctly removed from the ISR 
 when a broker disappears, but followers are not. Then, when a broker 
 reappears, the ISR is never updated.
 I used a variation of the Vagrant setup created by Joe Stein to reproduce 
 this with latest from the 0.8.1 branch: 
 https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1054) Eliminate Compilation Warnings for 0.8 Final Release

2015-04-27 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1054:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks. Merged the github PR.

 Eliminate Compilation Warnings for 0.8 Final Release
 

 Key: KAFKA-1054
 URL: https://issues.apache.org/jira/browse/KAFKA-1054
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang
Assignee: Ismael Juma
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1054-20150426-V1.patch, 
 KAFKA-1054-20150426-V2.patch, KAFKA-1054.patch, KAFKA-1054_Mar_10_2015.patch


 Currently we have a total number of 38 warnings for source code compilation 
 of 0.8.
 1) 3 from Unchecked type pattern
 2) 6 from Unchecked conversion
 3) 29 from Deprecated Hadoop API functions
 It's better we finish these before the final release of 0.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1748) Decouple system test cluster resources definition from service definitions

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513111#comment-14513111
 ] 

Neha Narkhede commented on KAFKA-1748:
--

[~ewencp] With the ducktape work, I guess we can close this?

 Decouple system test cluster resources definition from service definitions
 --

 Key: KAFKA-1748
 URL: https://issues.apache.org/jira/browse/KAFKA-1748
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
 Attachments: KAFKA-1748.patch, KAFKA-1748_2014-11-03_12:04:18.patch, 
 KAFKA-1748_2014-11-14_14:54:17.patch


 Currently the system tests use JSON files that specify the set of services 
 for each test and where they should run (i.e. hostname). These currently 
 assume that you already have SSH keys setup, use the same username on the 
 host running the tests and the test cluster, don't require any additional 
 ssh/scp/rsync flags, and assume you'll always have a fixed set of compute 
 resources (or that you'll spend a lot of time editing config files).
 While we don't want a whole cluster resource manager in the system tests, a 
 bit more flexibility would make it easier to, e.g., run tests against a local 
 vagrant cluster or on dynamically allocated EC2 instances. We can separate 
 out the basic resource spec (i.e. json specifying how to access machines) 
 from the service definition (i.e. a broker should run with settings x, y, z). 
 Restricting to a very simple set of mappings (i.e. map services to hosts with 
 round robin, optionally restricting to no reuse of hosts) should keep things 
 simple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2034) sourceCompatibility not set in Kafka build.gradle

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2034:
-
Resolution: Fixed
  Reviewer: Ewen Cheslack-Postava
  Assignee: Ismael Juma
Status: Resolved  (was: Patch Available)

Thanks for the patch. Pushed to trunk.

 sourceCompatibility not set in Kafka build.gradle
 -

 Key: KAFKA-2034
 URL: https://issues.apache.org/jira/browse/KAFKA-2034
 Project: Kafka
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.2.1
Reporter: Derek Bassett
Assignee: Ismael Juma
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-2034.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 The build.gradle does not explicitly set the sourceCompatibility version in 
 build.gradle.  This allows kafka when built by Java 1.8 to incorrectly set 
 the wrong version of the class files.  This also would allow Java 1.8 
 features to be merged into Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1351) String.format is very expensive in Scala

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513096#comment-14513096
 ] 

Neha Narkhede commented on KAFKA-1351:
--

Not sure that this is still a concern. I was hoping whoever picks this up can 
do a quick microbenchmark to see if the suggestion in the description is really 
worth a change or not.

 String.format is very expensive in Scala
 

 Key: KAFKA-1351
 URL: https://issues.apache.org/jira/browse/KAFKA-1351
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7.2, 0.8.0, 0.8.1
Reporter: Neha Narkhede
  Labels: newbie
 Fix For: 0.8.3

 Attachments: KAFKA-1351.patch, KAFKA-1351_2014-04-07_18:02:18.patch, 
 KAFKA-1351_2014-04-09_15:40:11.patch


 As found in KAFKA-1350, logging is causing significant overhead in the 
 performance of a Kafka server. There are several info statements that use 
 String.format which is particularly expensive. We should investigate adding 
 our own version of String.format that merely uses string concatenation under 
 the covers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1293) Mirror maker housecleaning

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-1293.
--
Resolution: Fixed

Closing based on [~mwarhaftig]'s latest comment.

 Mirror maker housecleaning
 --

 Key: KAFKA-1293
 URL: https://issues.apache.org/jira/browse/KAFKA-1293
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.1
Reporter: Jay Kreps
Priority: Minor
  Labels: usability
 Attachments: KAFKA-1293.patch


 Mirror maker uses it's own convention for command-line arguments, e.g. 
 --num.producers, where everywhere else follows the unix convention like 
 --num-producers. This is annoying because when running different tools you 
 have to constantly remember whatever quirks of the person who wrote that tool.
 Mirror maker should also have a top-level wrapper script in bin/ to make tab 
 completion work and so you don't have to remember the fully qualified class 
 name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1277) Keep the summery/description when updating the RB with kafka-patch-review

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-1277.
--
Resolution: Incomplete

Closing due to inactivity. 

 Keep the summery/description when updating the RB with kafka-patch-review
 -

 Key: KAFKA-1277
 URL: https://issues.apache.org/jira/browse/KAFKA-1277
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Manikumar Reddy
 Attachments: KAFKA-1277.patch, KAFKA-1277.patch, KAFKA-1277.patch, 
 KAFKA-1277_2014-10-04_16:39:56.patch, KAFKA-1277_2014-10-04_16:51:20.patch, 
 KAFKA-1277_2014-10-04_16:57:30.patch, KAFKA-1277_2014-10-04_17:00:37.patch, 
 KAFKA-1277_2014-10-04_17:01:43.patch, KAFKA-1277_2014-10-04_17:03:08.patch, 
 KAFKA-1277_2014-10-04_17:09:02.patch, KAFKA-1277_2014-10-05_11:04:33.patch, 
 KAFKA-1277_2014-10-05_11:09:08.patch, KAFKA-1277_2014-10-05_11:10:50.patch, 
 KAFKA-1277_2014-10-05_11:18:17.patch


 Today kafka-patch-review tool will always use a default title and description 
 if they are not specified, even when updating an existing RB. Would better 
 change to leave the current title/description as is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513094#comment-14513094
 ] 

Neha Narkhede commented on KAFKA-2140:
--

+1

 Improve code readability
 

 Key: KAFKA-2140
 URL: https://issues.apache.org/jira/browse/KAFKA-2140
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
Assignee: Ismael Juma
Priority: Minor
 Attachments: KAFKA-2140.patch


 There are a number of places where code could be written in a more readable 
 and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1737) Document required ZkSerializer for ZkClient used with AdminUtils

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513312#comment-14513312
 ] 

Neha Narkhede commented on KAFKA-1737:
--

[~vivekpm] Makes sense. Would you like to upload what you have?

 Document required ZkSerializer for ZkClient used with AdminUtils
 

 Key: KAFKA-1737
 URL: https://issues.apache.org/jira/browse/KAFKA-1737
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 0.8.1.1
Reporter: Stevo Slavic
Assignee: Vivek Madani
Priority: Minor
 Attachments: KAFKA-1737.patch


 {{ZkClient}} instances passed to {{AdminUtils}} calls must have 
 {{kafka.utils.ZKStringSerializer}} set as {{ZkSerializer}}. Otherwise 
 commands executed via {{AdminUtils}} may not be seen/recognizable to broker, 
 producer or consumer. E.g. producer (with auto topic creation turned off) 
 will not be able to send messages to a topic created via {{AdminUtils}}, it 
 will throw {{UnknownTopicOrPartitionException}}.
 Please consider at least documenting this requirement in {{AdminUtils}} 
 scaladoc.
 For more info see [related discussion on Kafka user mailing 
 list|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAAUywg-oihNiXuQRYeS%3D8Z3ymsmEHo6ghLs%3Dru4nbm%2BdHVz6TA%40mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2080) quick cleanup of producer performance scripts

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2080:
-
Component/s: tools

 quick cleanup of producer performance scripts
 -

 Key: KAFKA-2080
 URL: https://issues.apache.org/jira/browse/KAFKA-2080
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Gwen Shapira
Assignee: Gwen Shapira
  Labels: newbie

 We have two producer performance tools at the moment: one at 
 o.a.k.client.tools and one at kafka.tools
 bin/kafka-producer-perf-test.sh is calling the kafka.tools one.
 org.apache.kafka.clients.tools.ProducerPerformance has --messages listed as 
 optional (with default) while leaving the parameter out results in an error.
 Cleanup will include:
 * Removing the kafka.tools performance tool
 * Changing the shellscript to use new tool
 * Fix the misleading documentation for --messages
 * Adding both performance tools to the kafka docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2041) Add ability to specify a KeyClass for KafkaLog4jAppender

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2041:
-
Status: Patch Available  (was: Open)

 Add ability to specify a KeyClass for KafkaLog4jAppender
 

 Key: KAFKA-2041
 URL: https://issues.apache.org/jira/browse/KAFKA-2041
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Benoy Antony
Assignee: Jun Rao
 Attachments: KAFKA-2041.patch, kafka-2041-001.patch, 
 kafka-2041-002.patch, kafka-2041-003.patch


 KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. 
 Since there is no key or explicit partition number, the messages are sent to 
 random partitions. 
 In some cases, it is possible to derive a key from the message itself. 
 So it may be beneficial to enable KafkaLog4jAppender to accept KeyClass which 
 will provide a key for a given message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2003) Add upgrade tests

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2003:
-
Component/s: system tests

 Add upgrade tests
 -

 Key: KAFKA-2003
 URL: https://issues.apache.org/jira/browse/KAFKA-2003
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
Assignee: Ashish K Singh

 To test protocol changes, compatibility and upgrade process, we need a good 
 way to test different versions of the product together and to test end-to-end 
 upgrade process.
 For example, for 0.8.2 to 0.8.3 test we want to check:
 * Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers?
 * Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time?
 * Can 0.8.2 clients run against a cluster of 0.8.3 brokers?
 There are probably more questions. But an automated framework that can test 
 those and report results will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2108) Node deleted all data and re-sync from replicas after attempted upgrade from 0.8.1.1 to 0.8.2.0

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-2108.
--
Resolution: Incomplete

Closing due to inactivity. [~slimunholyone], [~becket_qin] feel free to reopen 
if you have new findings.

 Node deleted all data and re-sync from replicas after attempted upgrade from 
 0.8.1.1 to 0.8.2.0
 ---

 Key: KAFKA-2108
 URL: https://issues.apache.org/jira/browse/KAFKA-2108
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Thunder Stumpges

 Per [email 
 thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3CCAA%2BBczTUBqg1-tpcUjwfZgZYZyOXC-Myuhd_2EaGkeKWkrCVUQ%40mail.gmail.com%3E]
  in user group. 
 We ran into an issue in an attempt to perform a rolling upgrade from 0.8.1.1 
 to 0.8.2.0 (we should have had 0.8.2.1 but got the wrong binaries 
 accidentally). 
 In shutting down the first node, it failed a clean controlled shutdown due to 
 one corrupt topic. The message in server.log was:
 {noformat}
 [2015-03-31 10:21:46,250] INFO [Kafka Server 6], Remaining partitions to 
 move: [__samza_checkpoint_ver_1_for_usersessions_1,0] 
 (kafka.server.KafkaServer)
 [2015-03-31 10:21:46,250] INFO [Kafka Server 6], Error code from controller: 
 0 (kafka.server.KafkaServer)
 {noformat}
 And related message in state-change.log:
 {noformat}
 [2015-03-31 10:21:42,622] TRACE Controller 6 epoch 23 started leader election 
 for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] 
 (state.change.logger)
 [2015-03-31 10:21:42,623] ERROR Controller 6 epoch 23 encountered error while 
 electing leader for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] 
 due to: LeaderAndIsr information doesn't exist for partition 
 [__samza_checkpoint_ver_1_for_usersessions_1,0] in OnlinePartition state. 
 (state.change.logger)
 [2015-03-31 10:21:42,623] TRACE Controller 6 epoch 23 received response 
 correlationId 2360 for a request sent to broker id:8,host:xxx,port:9092 
 (state.change.logger) 
 [2015-03-31 10:21:42,623] ERROR Controller 6 epoch 23 initiated state change 
 for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] from 
 OnlinePartition to OnlinePartition failed (state.change.logger)
 kafka.common.StateChangeFailedException: encountered error while electing 
 leader for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] due to: 
 LeaderAndIsr information doesn't exist for partition 
 [__samza_checkpoint_ver_1_for_usersessions_1,0] in OnlinePartition state.
   at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:360)
   at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:187)
   at 
 kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:125)
   at 
 kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:124)
   at scala.collection.immutable.Set$Set1.foreach(Set.scala:86)
   at 
 kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:124)
   at 
 kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1$$anonfun$apply$mcV$sp$3.apply(KafkaController.scala:257)
   at 
 kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1$$anonfun$apply$mcV$sp$3.apply(KafkaController.scala:253)
   at scala.Option.foreach(Option.scala:197)
   at 
 kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1.apply$mcV$sp(KafkaController.scala:253)
   at 
 kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1.apply(KafkaController.scala:253)
   at 
 kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1.apply(KafkaController.scala:253)
   at kafka.utils.Utils$.inLock(Utils.scala:538)
   at 
 kafka.controller.KafkaController$$anonfun$shutdownBroker$3.apply(KafkaController.scala:252)
   at 
 kafka.controller.KafkaController$$anonfun$shutdownBroker$3.apply(KafkaController.scala:249)
   at 
 scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:130)
   at 
 scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
   at 
 scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
   at 
 kafka.controller.KafkaController.shutdownBroker(KafkaController.scala:249)
   at 
 kafka.server.KafkaApis.handleControlledShutdownRequest(KafkaApis.scala:264)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:192)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
   at java.lang.Thread.run(Thread.java:744)
 Caused 

[jira] [Commented] (KAFKA-2101) Metric metadata-age is reset on a failed update

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513318#comment-14513318
 ] 

Neha Narkhede commented on KAFKA-2101:
--

[~timbrooks] To clarify your question - we shouldn't update the metric on 
failed attempts. 

 Metric metadata-age is reset on a failed update
 ---

 Key: KAFKA-2101
 URL: https://issues.apache.org/jira/browse/KAFKA-2101
 Project: Kafka
  Issue Type: Bug
Reporter: Tim Brooks

 In org.apache.kafka.clients.Metadata there is a lastUpdate() method that 
 returns the time the metadata was lasted updated. This is only called by 
 metadata-age metric. 
 However the lastRefreshMs is updated on a failed update (when 
 MetadataResponse has not valid nodes). This is confusing since the metric's 
 name suggests that it is a true reflection of the age of the current 
 metadata. But the age might be reset by a failed update. 
 Additionally, lastRefreshMs is not reset on a failed update due to no node 
 being available. This seems slightly inconsistent, since one failure 
 condition resets the metrics, but another one does not. Especially since both 
 failure conditions do trigger the backoff (for the next attempt).
 I have not implemented a patch yet, because I am unsure what expected 
 behavior is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2100) Client Error doesn't preserve or display original server error code when it is an unknown code

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2100:
-
Component/s: clients

 Client Error doesn't preserve or display original server error code when it 
 is an unknown code
 --

 Key: KAFKA-2100
 URL: https://issues.apache.org/jira/browse/KAFKA-2100
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Gwen Shapira
Assignee: Gwen Shapira
  Labels: newbie

 When the java client receives an unfamiliar error code, it translates it into 
 UNKNOWN(-1, new UnknownServerException(The server experienced an unexpected 
 error when processing the request))
 This completely loses the original code, which makes troubleshooting from the 
 client impossible. 
 Will be better to preserve the original code and write it to the log when 
 logging the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1621) Standardize --messages option in perf scripts

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513324#comment-14513324
 ] 

Neha Narkhede commented on KAFKA-1621:
--

[~rekhajoshm] Sorry for the delay, reviewed your PR. However, could you send it 
for trunk instead of 0.8.2?

 Standardize --messages option in perf scripts
 -

 Key: KAFKA-1621
 URL: https://issues.apache.org/jira/browse/KAFKA-1621
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Jay Kreps
  Labels: newbie

 This option is specified in PerfConfig and is used by the producer, consumer 
 and simple consumer perf commands. The docstring on the argument does not 
 list it as required but the producer performance test requires it--others 
 don't.
 We should standardize this so that either all the commands require the option 
 and it is marked as required in the docstring or none of them list it as 
 required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1904) run sanity failed test

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1904:
-
Component/s: system tests

 run sanity failed test
 --

 Key: KAFKA-1904
 URL: https://issues.apache.org/jira/browse/KAFKA-1904
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Reporter: Joe Stein
Priority: Blocker
 Fix For: 0.8.3

 Attachments: run_sanity.log.gz


 _test_case_name  :  testcase_1
 _test_class_name  :  ReplicaBasicTest
 arg : bounce_broker  :  true
 arg : broker_type  :  leader
 arg : message_producing_free_time_sec  :  15
 arg : num_iteration  :  2
 arg : num_messages_to_produce_per_producer_call  :  50
 arg : num_partition  :  2
 arg : replica_factor  :  3
 arg : sleep_seconds_between_producer_calls  :  1
 validation_status  : 
  Test completed  :  FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2080) quick cleanup of producer performance scripts

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2080:
-
Labels: newbie  (was: )

 quick cleanup of producer performance scripts
 -

 Key: KAFKA-2080
 URL: https://issues.apache.org/jira/browse/KAFKA-2080
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Gwen Shapira
Assignee: Gwen Shapira
  Labels: newbie

 We have two producer performance tools at the moment: one at 
 o.a.k.client.tools and one at kafka.tools
 bin/kafka-producer-perf-test.sh is calling the kafka.tools one.
 org.apache.kafka.clients.tools.ProducerPerformance has --messages listed as 
 optional (with default) while leaving the parameter out results in an error.
 Cleanup will include:
 * Removing the kafka.tools performance tool
 * Changing the shellscript to use new tool
 * Fix the misleading documentation for --messages
 * Adding both performance tools to the kafka docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1858) Make ServerShutdownTest a bit less flaky

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1858:
-
Reviewer: Jun Rao
  Status: Patch Available  (was: Open)

 Make ServerShutdownTest a bit less flaky
 

 Key: KAFKA-1858
 URL: https://issues.apache.org/jira/browse/KAFKA-1858
 Project: Kafka
  Issue Type: Bug
Reporter: Gwen Shapira
 Attachments: KAFKA-1858.patch


 ServerShutdownTest currently:
 * Starts a KafkaServer
 * Does stuff
 * Stops the server
 * Counts if there are any live kafka threads
 This is fine on its own. But when running in a test suite (i.e gradle test), 
 the test is very very sensitive to any other test freeing all resources. If 
 you start a server in a previous test and forgot to close it, the 
 ServerShutdownTest will find threads from the previous test and fail.
 This makes for a flaky test that is pretty challenging to troubleshoot.
 I suggest counting the threads at the beginning and end of each test in the 
 class, and only failing if the number at the end is greater than the number 
 at the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2014) Chaos Monkey / Failure Inducer for Kafka

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2014:
-
Component/s: system tests

 Chaos Monkey / Failure Inducer for Kafka
 

 Key: KAFKA-2014
 URL: https://issues.apache.org/jira/browse/KAFKA-2014
 Project: Kafka
  Issue Type: Task
  Components: system tests
Reporter: Mayuresh Gharat
Assignee: Mayuresh Gharat

 Implement a Chaos Monkey for kafka, that will help us catch any shortcomings 
 in the test environment before going to production. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2103) kafka.producer.AsyncProducerTest failure.

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513316#comment-14513316
 ] 

Neha Narkhede commented on KAFKA-2103:
--

[~becket_qin] Is this still an issue? If so, can we make this a sub task of the 
umbrella JIRA for unit test failures that [~guozhang] created.

 kafka.producer.AsyncProducerTest failure.
 -

 Key: KAFKA-2103
 URL: https://issues.apache.org/jira/browse/KAFKA-2103
 Project: Kafka
  Issue Type: Bug
Reporter: Jiangjie Qin

 I saw this test consistently failing on trunk.
 The recent changes are KAFKA-2099, KAFKA-1926, KAFKA-1809.
 kafka.producer.AsyncProducerTest  testNoBroker FAILED
 org.scalatest.junit.JUnitTestFailedError: Should fail with 
 FailedToSendMessageException
 at 
 org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101)
 at 
 org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:149)
 at org.scalatest.Assertions$class.fail(Assertions.scala:711)
 at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:149)
 at 
 kafka.producer.AsyncProducerTest.testNoBroker(AsyncProducerTest.scala:300)
 kafka.producer.AsyncProducerTest  testIncompatibleEncoder PASSED
 kafka.producer.AsyncProducerTest  testRandomPartitioner PASSED
 kafka.producer.AsyncProducerTest  testFailedSendRetryLogic FAILED
 kafka.common.FailedToSendMessageException: Failed to send messages after 
 3 tries.
 at 
 kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:91)
 at 
 kafka.producer.AsyncProducerTest.testFailedSendRetryLogic(AsyncProducerTest.scala:415)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2100) Client Error doesn't preserve or display original server error code when it is an unknown code

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2100:
-
Labels: newbie  (was: )

 Client Error doesn't preserve or display original server error code when it 
 is an unknown code
 --

 Key: KAFKA-2100
 URL: https://issues.apache.org/jira/browse/KAFKA-2100
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Gwen Shapira
Assignee: Gwen Shapira
  Labels: newbie

 When the java client receives an unfamiliar error code, it translates it into 
 UNKNOWN(-1, new UnknownServerException(The server experienced an unexpected 
 error when processing the request))
 This completely loses the original code, which makes troubleshooting from the 
 client impossible. 
 Will be better to preserve the original code and write it to the log when 
 logging the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1758) corrupt recovery file prevents startup

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1758:
-
Reviewer: Neha Narkhede

 corrupt recovery file prevents startup
 --

 Key: KAFKA-1758
 URL: https://issues.apache.org/jira/browse/KAFKA-1758
 Project: Kafka
  Issue Type: Bug
  Components: log
Reporter: Jason Rosenberg
Assignee: Manikumar Reddy
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1758.patch


 Hi,
 We recently had a kafka node go down suddenly. When it came back up, it 
 apparently had a corrupt recovery file, and refused to startup:
 {code}
 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up 
 KafkaServer
 java.lang.NumberFormatException: For input string: 
 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
 at java.lang.Integer.parseInt(Integer.java:481)
 at java.lang.Integer.parseInt(Integer.java:527)
 at 
 scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
 at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
 at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at 
 scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
 at kafka.log.LogManager.loadLogs(LogManager.scala:105)
 at kafka.log.LogManager.init(LogManager.scala:57)
 at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
 {code}
 And the app is under a monitor (so it was repeatedly restarting and failing 
 with this error for several minutes before we got to it)…
 We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it 
 then restarted cleanly (but of course re-synced all it’s data from replicas, 
 so we had no data loss).
 Anyway, I’m wondering if that’s the expected behavior? Or should it not 
 declare it corrupt and then proceed automatically to an unclean restart?
 Should this NumberFormatException be handled a bit more gracefully?
 We saved the corrupt file if it’s worth inspecting (although I doubt it will 
 be useful!)….
 The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1758) corrupt recovery file prevents startup

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513186#comment-14513186
 ] 

Neha Narkhede commented on KAFKA-1758:
--

[~omkreddy] I took a quick look and left a few review comments. Should be able 
to merge once you fix those.

 corrupt recovery file prevents startup
 --

 Key: KAFKA-1758
 URL: https://issues.apache.org/jira/browse/KAFKA-1758
 Project: Kafka
  Issue Type: Bug
  Components: log
Reporter: Jason Rosenberg
Assignee: Manikumar Reddy
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1758.patch


 Hi,
 We recently had a kafka node go down suddenly. When it came back up, it 
 apparently had a corrupt recovery file, and refused to startup:
 {code}
 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up 
 KafkaServer
 java.lang.NumberFormatException: For input string: 
 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
 at java.lang.Integer.parseInt(Integer.java:481)
 at java.lang.Integer.parseInt(Integer.java:527)
 at 
 scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
 at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
 at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at 
 scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
 at kafka.log.LogManager.loadLogs(LogManager.scala:105)
 at kafka.log.LogManager.init(LogManager.scala:57)
 at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
 {code}
 And the app is under a monitor (so it was repeatedly restarting and failing 
 with this error for several minutes before we got to it)…
 We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it 
 then restarted cleanly (but of course re-synced all it’s data from replicas, 
 so we had no data loss).
 Anyway, I’m wondering if that’s the expected behavior? Or should it not 
 declare it corrupt and then proceed automatically to an unclean restart?
 Should this NumberFormatException be handled a bit more gracefully?
 We saved the corrupt file if it’s worth inspecting (although I doubt it will 
 be useful!)….
 The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2150) FetcherThread backoff need to grab lock before wait on condition.

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2150:
-
Reviewer: Guozhang Wang

[~guozhang] This was introduced in KAFKA-1461, so assigning to you for review 
since you reviewed that one :)

 FetcherThread backoff need to grab lock before wait on condition.
 -

 Key: KAFKA-2150
 URL: https://issues.apache.org/jira/browse/KAFKA-2150
 Project: Kafka
  Issue Type: Bug
Reporter: Jiangjie Qin
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Attachments: KAFKA-2150.patch, KAFKA-2150_2015-04-25_13:14:05.patch, 
 KAFKA-2150_2015-04-25_13:18:35.patch, KAFKA-2150_2015-04-25_13:35:36.patch


 Saw the following error: 
 kafka.api.ProducerBounceTest  testBrokerFailure STANDARD_OUT
 [2015-04-25 00:40:43,997] ERROR [ReplicaFetcherThread-0-0], Error due to  
 (kafka.server.ReplicaFetcherThread:103)
 java.lang.IllegalMonitorStateException
   at 
 java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:127)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1239)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.fullyRelease(AbstractQueuedSynchronizer.java:1668)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2107)
   at 
 kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:95)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 [2015-04-25 00:40:47,064] ERROR [ReplicaFetcherThread-0-1], Error due to  
 (kafka.server.ReplicaFetcherThread:103)
 java.lang.IllegalMonitorStateException
   at 
 java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:127)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1239)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.fullyRelease(AbstractQueuedSynchronizer.java:1668)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2107)
   at 
 kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:95)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 We should grab the lock before waiting on the condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513265#comment-14513265
 ] 

Neha Narkhede commented on KAFKA-1367:
--

[~jkreps] Yup, issue still exists and the solution I still recommend is to have 
the controller register watches and know the latest ISR for all partitions. 
This change isn't big if someone wants to take a stab. 

 Broker topic metadata not kept in sync with ZooKeeper
 -

 Key: KAFKA-1367
 URL: https://issues.apache.org/jira/browse/KAFKA-1367
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Ryan Berdeen
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1367.txt


 When a broker is restarted, the topic metadata responses from the brokers 
 will be incorrect (different from ZooKeeper) until a preferred replica leader 
 election.
 In the metadata, it looks like leaders are correctly removed from the ISR 
 when a broker disappears, but followers are not. Then, when a broker 
 reappears, the ISR is never updated.
 I used a variation of the Vagrant setup created by Joe Stein to reproduce 
 this with latest from the 0.8.1 branch: 
 https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2027) kafka never notifies the zookeeper client when a partition moved with due to an auto-rebalance (when auto.leader.rebalance.enable=true)

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513151#comment-14513151
 ] 

Neha Narkhede commented on KAFKA-2027:
--

Can you describe the end user behavior that you are looking for and the reason 
behind it?

 kafka never notifies the zookeeper client when a partition moved with due to 
 an auto-rebalance (when auto.leader.rebalance.enable=true)
 ---

 Key: KAFKA-2027
 URL: https://issues.apache.org/jira/browse/KAFKA-2027
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.8.1.1
 Environment: Kafka 0.8.1.1, Node.js  Mac OS
Reporter: Sampath Reddy Lambu
Assignee: Neha Narkhede
Priority: Blocker

 I would like report an issue when auto.leader.rebalance.enable=true. Kafka 
 never sends an event/notification to its zookeeper client after preferred 
 election complete. This works fine with manual rebalance from CLI 
 (kafka-preferred-replica-election.sh).
 Initially i thought this issue was with Kafka-Node, but its not. 
 An event should be emitted from zookeeper if any partition moved while 
 preferred election.
 Im working with kafka_2.9.2-0.8.1.1 (Broker's)  Kafka-Node(Node.JS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2105) NullPointerException in client on MetadataRequest

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2105:
-
Status: Patch Available  (was: Open)

 NullPointerException in client on MetadataRequest
 -

 Key: KAFKA-2105
 URL: https://issues.apache.org/jira/browse/KAFKA-2105
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.8.2.1
Reporter: Roger Hoover
Priority: Minor
 Attachments: guard-from-null.patch


 With the new producer, if you accidentally pass null to 
 KafkaProducer.partitionsFor(null), it will cause the IO thread to throw NPE.
 Uncaught error in kafka producer I/O thread: 
 java.lang.NullPointerException
   at org.apache.kafka.common.utils.Utils.utf8Length(Utils.java:174)
   at org.apache.kafka.common.protocol.types.Type$5.sizeOf(Type.java:176)
   at 
 org.apache.kafka.common.protocol.types.ArrayOf.sizeOf(ArrayOf.java:55)
   at org.apache.kafka.common.protocol.types.Schema.sizeOf(Schema.java:81)
   at org.apache.kafka.common.protocol.types.Struct.sizeOf(Struct.java:218)
   at 
 org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35)
   at 
 org.apache.kafka.common.requests.RequestSend.init(RequestSend.java:29)
   at 
 org.apache.kafka.clients.NetworkClient.metadataRequest(NetworkClient.java:369)
   at 
 org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:391)
   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:188)
   at 
 org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
   at 
 org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2029) Improving controlled shutdown for rolling updates

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513138#comment-14513138
 ] 

Neha Narkhede commented on KAFKA-2029:
--

[~jkreps] That's somewhat right. When the controller was simple and we didn't 
have the new non blocking NetworkClient, the single queue based blocking I/O 
wasn't a bad place to start. Now, it is time to refactor. Though I haven't 
given this a lot of thought, I'm not sure if a single-threaded design will 
work. It *might* be possible to end up with a single-threaded controller, 
though I can't be entirely sure. Basically, we have to think about situations 
when something has to happen in the callback and where that callback executes 
and if that callback execution should block sending other commands.

[~dmitrybugaychenko] Thanks for looking into this. Some of the problems you 
listed above should be resolved with the unlimited controller to broker queue 
change. For other changes, I highly recommend we look at the new network client 
and propose a design based on that. I also think we should avoid patching the 
current controller and be really really careful about testing. We have found 
any change to the controller to introduce subtle bugs causing serious 
instability. Let's start with a design doc and agree on that. I can help review 
it. 

 Improving controlled shutdown for rolling updates
 -

 Key: KAFKA-2029
 URL: https://issues.apache.org/jira/browse/KAFKA-2029
 Project: Kafka
  Issue Type: Improvement
  Components: controller
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Neha Narkhede
Priority: Critical
 Attachments: KAFKA-2029.patch, KAFKA-2029.patch


 Controlled shutdown as implemented currently can cause numerous problems: 
 deadlocks, local and global datalos, partitions without leader and etc. In 
 some cases the only way to restore cluster is to stop it completelly using 
 kill -9 and start again.
 Note 1: I'm aware of KAFKA-1305, but the proposed workaround to increase 
 queue size makes things much worse (see discussion there).
 Note 2: The problems described here can occure in any setup, but they are 
 extremly painful in setup with large brokers (36 disks, 1000+ assigned 
 partitions per broker in our case).
 Note 3: These improvements are actually workarounds and it is worth to 
 consider global refactoring of the controller (make it single thread, or even 
 get rid of it in the favour of ZK leader elections for partitions).
 The problems and improvements are:
 # Controlled shutdown takes a long time (10+ minutes), broker sends multiple 
 shutdown requests and finally considers it as failed and procedes to unclean 
 shutdow, controller got stuck for a while (holding a lock waiting for free 
 space in controller-to-broker queue). After broker starts back it receives 
 followers request and erases highwatermarks (with a message that replica 
 does not exists - controller hadn't yet sent a request with replica 
 assignment), then controller starts replicas on the broker it deletes all 
 local data (due to missing highwatermarks). Furthermore, controller starts 
 processing pending shutdown request and stops replicas on the broker letting 
 it in a non-functional state. Solution to the problem might be to increase 
 time broker waits for controller reponse to shutdown request, but this 
 timeout is taken from controller.socket.timeout.ms which is global for all 
 broker-controller communication and setting it to 30 minutes is dangerous. 
 *Proposed solution: introduce dedicated config parameter for this timeout 
 with a high default*.
 # If a broker gets down during controlled shutdown and did not come back 
 controller got stuck in a deadlock (one thread owns the lock and tries to add 
 message to the dead broker's queue, send thread is a infinite loop trying to 
 retry message to the dead broker, and the broker failure handler is waiting 
 for a lock). There are numerous partitions without a leader and the only way 
 out is to kill -9 the controller. *Proposed solution: add timeout for adding 
 message to broker's queue*. ControllerChannelManager.sendRequest:
 {code}
   def sendRequest(brokerId : Int, request : RequestOrResponse, callback: 
 (RequestOrResponse) = Unit = null) {
 brokerLock synchronized {
   val stateInfoOpt = brokerStateInfo.get(brokerId)
   stateInfoOpt match {
 case Some(stateInfo) =
   // ODKL Patch: prevent infinite hang on trying to send message to a 
 dead broker.
   // TODO: Move timeout to config
   if (!stateInfo.messageQueue.offer((request, callback), 10, 
 TimeUnit.SECONDS)) {
 error(Timed out trying to send message to broker  + 
 brokerId.toString)
 // Do not throw, as it 

[jira] [Updated] (KAFKA-2029) Improving controlled shutdown for rolling updates

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2029:
-
Reviewer: Neha Narkhede

 Improving controlled shutdown for rolling updates
 -

 Key: KAFKA-2029
 URL: https://issues.apache.org/jira/browse/KAFKA-2029
 Project: Kafka
  Issue Type: Improvement
  Components: controller
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Neha Narkhede
Priority: Critical
 Attachments: KAFKA-2029.patch, KAFKA-2029.patch


 Controlled shutdown as implemented currently can cause numerous problems: 
 deadlocks, local and global datalos, partitions without leader and etc. In 
 some cases the only way to restore cluster is to stop it completelly using 
 kill -9 and start again.
 Note 1: I'm aware of KAFKA-1305, but the proposed workaround to increase 
 queue size makes things much worse (see discussion there).
 Note 2: The problems described here can occure in any setup, but they are 
 extremly painful in setup with large brokers (36 disks, 1000+ assigned 
 partitions per broker in our case).
 Note 3: These improvements are actually workarounds and it is worth to 
 consider global refactoring of the controller (make it single thread, or even 
 get rid of it in the favour of ZK leader elections for partitions).
 The problems and improvements are:
 # Controlled shutdown takes a long time (10+ minutes), broker sends multiple 
 shutdown requests and finally considers it as failed and procedes to unclean 
 shutdow, controller got stuck for a while (holding a lock waiting for free 
 space in controller-to-broker queue). After broker starts back it receives 
 followers request and erases highwatermarks (with a message that replica 
 does not exists - controller hadn't yet sent a request with replica 
 assignment), then controller starts replicas on the broker it deletes all 
 local data (due to missing highwatermarks). Furthermore, controller starts 
 processing pending shutdown request and stops replicas on the broker letting 
 it in a non-functional state. Solution to the problem might be to increase 
 time broker waits for controller reponse to shutdown request, but this 
 timeout is taken from controller.socket.timeout.ms which is global for all 
 broker-controller communication and setting it to 30 minutes is dangerous. 
 *Proposed solution: introduce dedicated config parameter for this timeout 
 with a high default*.
 # If a broker gets down during controlled shutdown and did not come back 
 controller got stuck in a deadlock (one thread owns the lock and tries to add 
 message to the dead broker's queue, send thread is a infinite loop trying to 
 retry message to the dead broker, and the broker failure handler is waiting 
 for a lock). There are numerous partitions without a leader and the only way 
 out is to kill -9 the controller. *Proposed solution: add timeout for adding 
 message to broker's queue*. ControllerChannelManager.sendRequest:
 {code}
   def sendRequest(brokerId : Int, request : RequestOrResponse, callback: 
 (RequestOrResponse) = Unit = null) {
 brokerLock synchronized {
   val stateInfoOpt = brokerStateInfo.get(brokerId)
   stateInfoOpt match {
 case Some(stateInfo) =
   // ODKL Patch: prevent infinite hang on trying to send message to a 
 dead broker.
   // TODO: Move timeout to config
   if (!stateInfo.messageQueue.offer((request, callback), 10, 
 TimeUnit.SECONDS)) {
 error(Timed out trying to send message to broker  + 
 brokerId.toString)
 // Do not throw, as it brings controller into completely 
 non-functional state
 // Controller to broker state change requests batch is not empty 
 while creating a new one
 //throw new IllegalStateException(Timed out trying to send 
 message to broker  + brokerId.toString)
   }
 case None =
   warn(Not sending request %s to broker %d, since it is 
 offline..format(request, brokerId))
   }
 }
   }
 {code}
 # When broker which is a controler starts shut down if auto leader rebalance 
 is running it deadlocks in the end (shutdown thread owns the lock and waits 
 for rebalance thread to exit and rebalance thread wait for lock). *Proposed 
 solution: use bounded wait in rebalance thread*. KafkaController.scala:
 {code}
   // ODKL Patch to prevent deadlocks in shutdown.
   /**
* Execute the given function inside the lock
*/
   def inLockIfRunning[T](lock: ReentrantLock)(fun: = T): T = {
 if (isRunning || lock.isHeldByCurrentThread) {
   // TODO: Configure timeout.
   if (!lock.tryLock(10, TimeUnit.SECONDS)) {
 throw new IllegalStateException(Failed to acquire controller lock in 
 10 seconds.);
   }
   try {
  

[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513202#comment-14513202
 ] 

Neha Narkhede commented on KAFKA-2140:
--

Fixed and pushed.

 Improve code readability
 

 Key: KAFKA-2140
 URL: https://issues.apache.org/jira/browse/KAFKA-2140
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
Assignee: Ismael Juma
Priority: Minor
 Attachments: KAFKA-2140-fix.patch, KAFKA-2140.patch


 There are a number of places where code could be written in a more readable 
 and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513159#comment-14513159
 ] 

Neha Narkhede commented on KAFKA-1887:
--

[~sriharsha] The patch is waiting for a comment. If you can squeeze that in, 
I'll help you merge this change. 

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1940) Initial checkout and build failing

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1940:
-
Status: Patch Available  (was: Open)

 Initial checkout and build failing
 --

 Key: KAFKA-1940
 URL: https://issues.apache.org/jira/browse/KAFKA-1940
 Project: Kafka
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.1.2
 Environment: Groovy:   1.8.6
 Ant:  Apache Ant(TM) version 1.9.2 compiled on July 8 2013
 Ivy:  2.2.0
 JVM:  1.8.0_25 (Oracle Corporation 25.25-b02)
 OS:   Windows 7 6.1 amd64
Reporter: Martin Lemanski
  Labels: build
 Attachments: zinc-upgrade.patch


 when performing `gradle wrapper` and `gradlew build` as a new developer, I 
 get an exception: 
 {code}
 C:\development\git\kafkagradlew build --stacktrace
 ...
 FAILURE: Build failed with an exception.
 * What went wrong:
 Execution failed for task ':core:compileScala'.
  com.typesafe.zinc.Setup.create(Lcom/typesafe/zinc/ScalaLocation;Lcom/typesafe/zinc/SbtJars;Ljava/io/File;)Lcom/typesaf
 e/zinc/Setup;
 {code}
 Details: https://gist.github.com/mlem/ddff83cc8a25b040c157
 Current Commit:
 {code}
 C:\development\git\kafkagit rev-parse --verify HEAD
 71602de0bbf7727f498a812033027f6cbfe34eb8
 {code}
 I am evaluating kafka for my company and wanted to run some tests with it, 
 but couldn't due to this error. I know gradle can be tricky and it's not easy 
 to setup everything correct, but this kind of bugs turns possible 
 commiters/users off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)

2015-04-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503169#comment-14503169
 ] 

Neha Narkhede commented on KAFKA-873:
-

I'm not so sure yet that moving to Curator is a good idea, at least not until 
we do a full analysis of the current zkclient problems and how Curator fixes 
those. Agreed that zkclient is not very well supported, but anytime we have 
found a serious bug, they have accepted the patch and released it. But my 
understanding is that the upside of Curator is that it includes a set of 
recipes for common operations that people use ZooKeeper for. 

Let me elaborate on what I think is the problem with zkclient. It wraps the 
zookeeper client APIs with the purpose of making it easy to perform common 
ZooKeeper operations. However, this limits the user to the behavior dictated by 
the wrapper, irrespective of how the underlying zookeeper library behaves. An 
example of this is the indefinite retries during a ZooKeeper disconnect. You 
may not want to retry indefinitely and might want to quit the operation after a 
timeout. Then there are various bugs introduced due to the zkclient wrapper 
design. For example, we have seen weird bugs due to the fact that zkclient 
saves the list of triggered watches in an internal queue and invokes the 
configured user callback in a background thread. 

The problems we've seen with zkclient will not be fixed with another wrapper 
(Curator). It looks like it will be better for us to just write a simple 
zookeeper client utility inside Kafka itself. If you look at zkclient, it is a 
pretty simple wrapper over the zookeeper client APIs. So this may not be a huge 
undertaking and will be a better long-term solution

 Consider replacing zkclient with curator (with zkclient-bridge)
 ---

 Key: KAFKA-873
 URL: https://issues.apache.org/jira/browse/KAFKA-873
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Scott Clasen
Assignee: Grant Henke

 If zkclient was replaced with curator and curator-x-zkclient-bridge it would 
 be initially a drop-in replacement
 https://github.com/Netflix/curator/wiki/ZKClient-Bridge
 With the addition of a few more props to ZkConfig, and a bit of code this 
 would open up the possibility of using ACLs in zookeeper (which arent 
 supported directly by zkclient), as well as integrating with netflix 
 exhibitor for those of us using that.
 Looks like KafkaZookeeperClient needs some love anyhow...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2057) DelayedOperationTest.testRequestExpiry transient failure

2015-04-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2057:
-
Resolution: Fixed
  Reviewer: Neha Narkhede
Status: Resolved  (was: Patch Available)

Thanks for the patch. Pushed to trunk.

 DelayedOperationTest.testRequestExpiry transient failure
 

 Key: KAFKA-2057
 URL: https://issues.apache.org/jira/browse/KAFKA-2057
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang
Assignee: Rajini Sivaram
  Labels: newbie
 Attachments: KAFKA-2057.patch


 {code}
 kafka.server.DelayedOperationTest  testRequestExpiry FAILED
 junit.framework.AssertionFailedError: Time for expiration 19 should at 
 least 20
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at 
 kafka.server.DelayedOperationTest.testRequestExpiry(DelayedOperationTest.scala:68)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2087) TopicConfigManager javadoc references incorrect paths

2015-04-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2087:
-
Reviewer: Neha Narkhede

 TopicConfigManager javadoc references incorrect paths
 -

 Key: KAFKA-2087
 URL: https://issues.apache.org/jira/browse/KAFKA-2087
 Project: Kafka
  Issue Type: Bug
Reporter: Aditya Auradkar
Assignee: Aditya Auradkar
Priority: Trivial
 Attachments: KAFKA-2087.patch


 The TopicConfigManager docs refer to znodes in 
 /brokers/topics/topic_name/config which is incorrect.
 Fix javadoc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2087) TopicConfigManager javadoc references incorrect paths

2015-04-20 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2087:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to trunk. Thanks!

 TopicConfigManager javadoc references incorrect paths
 -

 Key: KAFKA-2087
 URL: https://issues.apache.org/jira/browse/KAFKA-2087
 Project: Kafka
  Issue Type: Bug
Reporter: Aditya Auradkar
Assignee: Aditya Auradkar
Priority: Trivial
 Attachments: KAFKA-2087.patch


 The TopicConfigManager docs refer to znodes in 
 /brokers/topics/topic_name/config which is incorrect.
 Fix javadoc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1961) Looks like its possible to delete _consumer_offsets topic

2015-04-03 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394856#comment-14394856
 ] 

Neha Narkhede commented on KAFKA-1961:
--

+1. Checking it in...

 Looks like its possible to delete _consumer_offsets topic
 -

 Key: KAFKA-1961
 URL: https://issues.apache.org/jira/browse/KAFKA-1961
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.0
Reporter: Gwen Shapira
Assignee: Gwen Shapira
  Labels: newbie
 Attachments: KAFKA-1961-6.patch, KAFKA-1961.3.patch, 
 KAFKA-1961.4.patch


 Noticed that kafka-topics.sh --delete can successfully delete internal topics 
 (__consumer_offsets).
 I'm pretty sure we want to prevent that, to avoid users shooting themselves 
 in the foot.
 Topic admin command should check for internal topics, just like 
 ReplicaManager does and not let users delete them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1961) Looks like its possible to delete _consumer_offsets topic

2015-04-03 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1961:
-
Resolution: Fixed
  Assignee: Ted Malaska  (was: Gwen Shapira)
Status: Resolved  (was: Patch Available)

Thanks for the patch, Ted! Pushed to trunk.

 Looks like its possible to delete _consumer_offsets topic
 -

 Key: KAFKA-1961
 URL: https://issues.apache.org/jira/browse/KAFKA-1961
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.0
Reporter: Gwen Shapira
Assignee: Ted Malaska
  Labels: newbie
 Attachments: KAFKA-1961-6.patch, KAFKA-1961.3.patch, 
 KAFKA-1961.4.patch


 Noticed that kafka-topics.sh --delete can successfully delete internal topics 
 (__consumer_offsets).
 I'm pretty sure we want to prevent that, to avoid users shooting themselves 
 in the foot.
 Topic admin command should check for internal topics, just like 
 ReplicaManager does and not let users delete them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1293) Mirror maker housecleaning

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386039#comment-14386039
 ] 

Neha Narkhede edited comment on KAFKA-1293 at 3/30/15 12:15 AM:


cc [~junrao] who can help with the access to the wiki.


was (Author: nehanarkhede):
cc [~junrao]

 Mirror maker housecleaning
 --

 Key: KAFKA-1293
 URL: https://issues.apache.org/jira/browse/KAFKA-1293
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.1
Reporter: Jay Kreps
Priority: Minor
  Labels: usability
 Attachments: KAFKA-1293.patch


 Mirror maker uses it's own convention for command-line arguments, e.g. 
 --num.producers, where everywhere else follows the unix convention like 
 --num-producers. This is annoying because when running different tools you 
 have to constantly remember whatever quirks of the person who wrote that tool.
 Mirror maker should also have a top-level wrapper script in bin/ to make tab 
 completion work and so you don't have to remember the fully qualified class 
 name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1293) Mirror maker housecleaning

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386039#comment-14386039
 ] 

Neha Narkhede commented on KAFKA-1293:
--

cc [~junrao]

 Mirror maker housecleaning
 --

 Key: KAFKA-1293
 URL: https://issues.apache.org/jira/browse/KAFKA-1293
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.1
Reporter: Jay Kreps
Priority: Minor
  Labels: usability
 Attachments: KAFKA-1293.patch


 Mirror maker uses it's own convention for command-line arguments, e.g. 
 --num.producers, where everywhere else follows the unix convention like 
 --num-producers. This is annoying because when running different tools you 
 have to constantly remember whatever quirks of the person who wrote that tool.
 Mirror maker should also have a top-level wrapper script in bin/ to make tab 
 completion work and so you don't have to remember the fully qualified class 
 name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1994) Evaluate performance effect of chroot check on Topic creation

2015-03-29 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1994:
-
Reviewer: Jun Rao

 Evaluate performance effect of chroot check on Topic creation
 -

 Key: KAFKA-1994
 URL: https://issues.apache.org/jira/browse/KAFKA-1994
 Project: Kafka
  Issue Type: Improvement
Reporter: Ashish K Singh
Assignee: Ashish K Singh
 Attachments: KAFKA-1994.patch, KAFKA-1994_2015-03-03_18:19:45.patch


 KAFKA-1664 adds check for chroot while creating a node in ZK. ZkPath checks 
 if namespace exists before trying to create a path in ZK. This raises a 
 concern that checking namespace for each path creation might be unnecessary 
 and can potentially make creations expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1215) Rack-Aware replica assignment option

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386037#comment-14386037
 ] 

Neha Narkhede commented on KAFKA-1215:
--

[~allenxwang] This was inactive for a while, but I think it will be good to 
wait until KAFKA-1792 is done to propose a solution for rack-awareness. 

 Rack-Aware replica assignment option
 

 Key: KAFKA-1215
 URL: https://issues.apache.org/jira/browse/KAFKA-1215
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0
Reporter: Joris Van Remoortere
Assignee: Jun Rao
 Fix For: 0.9.0

 Attachments: rack_aware_replica_assignment_v1.patch, 
 rack_aware_replica_assignment_v2.patch


 Adding a rack-id to kafka config. This rack-id can be used during replica 
 assignment by using the max-rack-replication argument in the admin scripts 
 (create topic, etc.). By default the original replication assignment 
 algorithm is used because max-rack-replication defaults to -1. 
 max-rack-replication  -1 is not honored if you are doing manual replica 
 assignment (preffered).
 If this looks good I can add some test cases specific to the rack-aware 
 assignment.
 I can also port this to trunk. We are currently running 0.8.0 in production 
 and need this, so i wrote the patch against that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385996#comment-14385996
 ] 

Neha Narkhede commented on KAFKA-2046:
--

[~onurkaraman] Shouldn't controller.message.queue.size be infinite? It seems 
that the moment there is some backlog of state changes on the controller, it 
will deadlock causing bad things to happen.

 Delete topic still doesn't work
 ---

 Key: KAFKA-2046
 URL: https://issues.apache.org/jira/browse/KAFKA-2046
 Project: Kafka
  Issue Type: Bug
Reporter: Clark Haskins
Assignee: Onur Karaman

 I just attempted to delete at 128 partition topic with all inbound producers 
 stopped.
 The result was as follows:
 The /admin/delete_topics znode was empty
 the topic under /brokers/topics was removed
 The Kafka topics command showed that the topic was removed
 However, the data on disk on each broker was not deleted and the topic has 
 not yet been re-created by starting up the inbound mirror maker.
 Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385999#comment-14385999
 ] 

Neha Narkhede commented on KAFKA-527:
-

[~guozhang] +1. Will be great to re-run the test after your patch.

 Compression support does numerous byte copies
 -

 Key: KAFKA-527
 URL: https://issues.apache.org/jira/browse/KAFKA-527
 Project: Kafka
  Issue Type: Bug
  Components: compression
Reporter: Jay Kreps
Assignee: Yasuhiro Matsuda
Priority: Critical
 Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
 KAFKA-527_2015-03-16_15:19:29.patch, KAFKA-527_2015-03-19_21:32:24.patch, 
 KAFKA-527_2015-03-25_12:08:00.patch, KAFKA-527_2015-03-25_13:26:36.patch, 
 java.hprof.no-compression.txt, java.hprof.snappy.text


 The data path for compressing or decompressing messages is extremely 
 inefficient. We do something like 7 (?) complete copies of the data, often 
 for simple things like adding a 4 byte size to the front. I am not sure how 
 this went by unnoticed.
 This is likely the root cause of the performance issues we saw in doing bulk 
 recompression of data in mirror maker.
 The mismatch between the InputStream and OutputStream interfaces and the 
 Message/MessageSet interfaces which are based on byte buffers is the cause of 
 many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386021#comment-14386021
 ] 

Neha Narkhede commented on KAFKA-2046:
--

[~onurkaraman] Could [~clarkhaskins] reproduce the delete topic issue once you 
changed the controller.message.queue.size to its default size (infinite)?

 Delete topic still doesn't work
 ---

 Key: KAFKA-2046
 URL: https://issues.apache.org/jira/browse/KAFKA-2046
 Project: Kafka
  Issue Type: Bug
Reporter: Clark Haskins
Assignee: Onur Karaman

 I just attempted to delete at 128 partition topic with all inbound producers 
 stopped.
 The result was as follows:
 The /admin/delete_topics znode was empty
 the topic under /brokers/topics was removed
 The Kafka topics command showed that the topic was removed
 However, the data on disk on each broker was not deleted and the topic has 
 not yet been re-created by starting up the inbound mirror maker.
 Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1983) TestEndToEndLatency can be unreliable after hard kill

2015-03-29 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386028#comment-14386028
 ] 

Neha Narkhede commented on KAFKA-1983:
--

cc [~junrao]

 TestEndToEndLatency can be unreliable after hard kill
 -

 Key: KAFKA-1983
 URL: https://issues.apache.org/jira/browse/KAFKA-1983
 Project: Kafka
  Issue Type: Improvement
Reporter: Jun Rao
Assignee: Grayson Chao
  Labels: newbie

 If you hard kill TestEndToEndLatency, the committed offset remains the last 
 checkpointed one. However, more messages are now appended after the last 
 checkpointed offset. When restarting TestEndToEndLatency, the consumer 
 resumes from the last checkpointed offset and will report really low latency 
 since it doesn't need to wait for a new message to be produced to read the 
 next message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2034) sourceCompatibility not set in Kafka build.gradle

2015-03-29 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2034:
-
Labels: newbie  (was: )

 sourceCompatibility not set in Kafka build.gradle
 -

 Key: KAFKA-2034
 URL: https://issues.apache.org/jira/browse/KAFKA-2034
 Project: Kafka
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.2.1
Reporter: Derek Bassett
Priority: Minor
  Labels: newbie
   Original Estimate: 4h
  Remaining Estimate: 4h

 The build.gradle does not explicitly set the sourceCompatibility version in 
 build.gradle.  This allows kafka when built by Java 1.8 to incorrectly set 
 the wrong version of the class files.  This also would allow Java 1.8 
 features to be merged into Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2034) sourceCompatibility not set in Kafka build.gradle

2015-03-29 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2034:
-
Component/s: build

 sourceCompatibility not set in Kafka build.gradle
 -

 Key: KAFKA-2034
 URL: https://issues.apache.org/jira/browse/KAFKA-2034
 Project: Kafka
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.2.1
Reporter: Derek Bassett
Priority: Minor
  Labels: newbie
   Original Estimate: 4h
  Remaining Estimate: 4h

 The build.gradle does not explicitly set the sourceCompatibility version in 
 build.gradle.  This allows kafka when built by Java 1.8 to incorrectly set 
 the wrong version of the class files.  This also would allow Java 1.8 
 features to be merged into Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work

2015-03-24 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378783#comment-14378783
 ] 

Neha Narkhede commented on KAFKA-2046:
--

[~clarkhaskins] As discussed previously, the minimum amount of information that 
is needed to troubleshoot any issue, not just delete topic is-
1. Controller logs (possibly at TRACE)
2. Server logs
3 State change log (DEBUG works)

 Delete topic still doesn't work
 ---

 Key: KAFKA-2046
 URL: https://issues.apache.org/jira/browse/KAFKA-2046
 Project: Kafka
  Issue Type: Bug
Reporter: Clark Haskins
Assignee: Sriharsha Chintalapani

 I just attempted to delete at 128 partition topic with all inbound producers 
 stopped.
 The result was as follows:
 The /admin/delete_topics znode was empty
 the topic under /brokers/topics was removed
 The Kafka topics command showed that the topic was removed
 However, the data on disk on each broker was not deleted and the topic has 
 not yet been re-created by starting up the inbound mirror maker.
 Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1928) Move kafka.network over to using the network classes in org.apache.kafka.common.network

2015-03-16 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1928:
-
Issue Type: Sub-task  (was: Improvement)
Parent: KAFKA-1682

 Move kafka.network over to using the network classes in 
 org.apache.kafka.common.network
 ---

 Key: KAFKA-1928
 URL: https://issues.apache.org/jira/browse/KAFKA-1928
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Gwen Shapira

 As part of the common package we introduced a bunch of network related code 
 and abstractions.
 We should look into replacing a lot of what is in kafka.network with this 
 code. Duplicate classes include things like Receive, Send, etc. It is likely 
 possible to also refactor the SocketServer to make use of Selector which 
 should significantly simplify it's code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1928) Move kafka.network over to using the network classes in org.apache.kafka.common.network

2015-03-16 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1928:
-
Component/s: security

 Move kafka.network over to using the network classes in 
 org.apache.kafka.common.network
 ---

 Key: KAFKA-1928
 URL: https://issues.apache.org/jira/browse/KAFKA-1928
 Project: Kafka
  Issue Type: Improvement
  Components: security
Reporter: Jay Kreps
Assignee: Gwen Shapira

 As part of the common package we introduced a bunch of network related code 
 and abstractions.
 We should look into replacing a lot of what is in kafka.network with this 
 code. Duplicate classes include things like Receive, Send, etc. It is likely 
 possible to also refactor the SocketServer to make use of Selector which 
 should significantly simplify it's code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-15 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362581#comment-14362581
 ] 

Neha Narkhede commented on KAFKA-1546:
--

[~aauradkar] Thanks for the patch. Overall, the changes look correct. I left a 
few review comments. And thanks for sharing the test results. Look forward to 
the updated patch.

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch, 
 KAFKA-1546_2015-03-12_13:42:01.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >