[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-12-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073048#comment-15073048
 ] 

ASF GitHub Bot commented on STORM-650:
--

Github user d2r commented on the pull request:

https://github.com/apache/storm/pull/406#issuecomment-167632103
  
@ogorun, how should we move forward with this work?  Has this issue been 
resolved as part of STORM-650, or does it still need to be addressed?


> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Epic
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>Assignee: Hugo Louro
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-08-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716203#comment-14716203
 ] 

ASF GitHub Bot commented on STORM-650:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/703#issuecomment-135318226
  
@hsun-cnnxty 
Since STORM-650 is a kind of EPIC / Umbrella issue, and it seems not 
maintained now, so it would be better to file new issue.
And please do not modify version from pom.xml.


> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Epic
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-08-26 Thread Hang Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716165#comment-14716165
 ] 

Hang Sun commented on STORM-650:


Hi all,

I submitted   https://github.com/apache/storm/pull/703

The PR is about storing spout state, including offsets, inside Kafka instead of 
ZK using Kafka's offset management api.  Not sure whether it is also in the 
scope of Use Kafka APIs Instead of Internal ZK Metadata (STORM-590).  I added 
it mainly to make it work with existing offset monitoring tools.  The change is 
completely backward compatible.

Thanks

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Epic
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-06-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582177#comment-14582177
 ] 

ASF GitHub Bot commented on STORM-650:
--

Github user d2r commented on the pull request:

https://github.com/apache/storm/pull/406#issuecomment-92307
  
> Is it better to open separate JIRA issue or this pull request is 
considered as part of subjects discussed in STORM-650?

@ogorun  I am sorry I did not see the notification for you comment earlier. 
 Yes, I think the place to discuss this is STORM-650.  We might ask on 
STORM-650 whether we should add a new JIRA Issue for this to the STORM-650 
epic.  We might ask on STORM-650 whether we should add a new JIRA Issue for 
this to the STORM-650 epic.  So far, I am not sure this specific issue is 
addressed in STORM-650, and it seems some kafka API changes are coming as well.


> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Epic
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-22 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507877#comment-14507877
 ] 

Jay Kreps commented on STORM-650:
-

Great, really appreciate your looking at that!

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-22 Thread Thomas Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507778#comment-14507778
 ] 

Thomas Becker commented on STORM-650:
-

Thanks [~jkreps]. 
As far as I can see the new consumer api offers everything we need. Especially 
the transparent leader lookup/handoff will simplify things.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-11 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491161#comment-14491161
 ] 

Jay Kreps commented on STORM-650:
-

Hey [~wurstmeister] yeah that test case will be covered. That client is still 
under very active development and the system level testing isn't started yet so 
I wouldn't worry too much about any current deficiencies aside from API  
shortcomings.

You are correct that subscribing to complete topics requires the server-side 
partition balancing feature that will be in 0.8.3. Partition-level consumption 
just depends on the fetch and metadata requests which are unchanged in the 
0.8.x releases. I believe, but we would have to check that offset commit is 
there from 0.8.1 on if you guys would be using the offset commit apis.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-11 Thread Thomas Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491104#comment-14491104
 ] 

Thomas Becker commented on STORM-650:
-

Thanks for the confirmation [~jkreps]. While evaluating the new consumer for 
storm I came across a small issue with the leadership handoff when subscribing 
to individual partitions. I’ve put my test case on 
[github|https://github.com/wurstmeister/kafka-client-test/blob/master/README.md].
 It would be great if you could let me know if ["test case 
2"|https://github.com/wurstmeister/kafka-client-test/blob/master/README.md#test-case-2-per-partition-subscription]
 is a valid test case or if I have misunderstood the documentation 
(http://kafka.apache.org/083/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html).
 

In terms of compatibility it looks like the new client is compatible with kafka 
< 0.8.3 if we only subscribe to TopicPartitions because that does not use the 
new join API. Is that correct?

Thanks

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-08 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486599#comment-14486599
 ] 

Jay Kreps commented on STORM-650:
-

Leadership handoff is totally transparent from the point of view of that 
client. All failover is done within the client. When a leader fails you might 
see a slight blip in latency but the client handles the process of discovering 
the new leader and fetching data from it. this is true irrespective of whether 
you subscribe to individual partitions or to the whole topic.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-08 Thread Thomas Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486437#comment-14486437
 ] 

Thomas Becker commented on STORM-650:
-

Hi Jay, 

could you please point me to some documentation that describes how the 
KafkaConsumer handles a leader change?
Is there any difference in the behaviour depending on whether I subscribe to a 
whole topic or individual TopicPartitions? 

Thanks 

Thomas 

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-04 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395841#comment-14395841
 ] 

Jay Kreps commented on STORM-650:
-

[~ptgoetz] Awesome, much appreciated.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-04 Thread P. Taylor Goetz (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395839#comment-14395839
 ] 

P. Taylor Goetz commented on STORM-650:
---

[~jkreps] nm... Found the docs you referenced earlier.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-04 Thread P. Taylor Goetz (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395837#comment-14395837
 ] 

P. Taylor Goetz commented on STORM-650:
---

Thanks [~jkreps], I think that sounds like a great idea. Can you point to some 
docs for the new consumer API?

I'll try to review the API and our code to see what, if anything, we would need.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-04 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395777#comment-14395777
 ] 

Jay Kreps commented on STORM-650:
-

Hey [~wurstmeister] and [~ptgoetz] we are going to release that Kafka consumer 
API in the next release. I believe it should cover the needs you guys have and 
should dramatically simplify the kafka-storm code--it internally handles server 
failure, partition migration, offset storage, etc, but gives you full control 
over partition assignment and offset commit points which are the needs of a 
stream processing system. This should also remove all unnecessary threading 
from your code too as the consumer is fully non-blocking. Prior to the Kafka 
release it would be great if someone who knows the storm-kafka integration well 
could do a deep dive and just validate that these apis would indeed cover your 
needs and also validate that it would really significantly simplify your life. 
We think both should be true, but it would be good to check so we can make 
changes if needed. Once it is released we will have to break compatibility with 
each change so flushing these things out now is just much easier.

I'd be happy to jump on a quick call to discuss if that is useful.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-04-01 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391164#comment-14391164
 ] 

Sriharsha Chintalapani commented on STORM-650:
--

[~ptgoetz] lets have fix version for this JIRA so that we can move forward with 
a release in mind. 

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-03-31 Thread P. Taylor Goetz (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389908#comment-14389908
 ] 

P. Taylor Goetz commented on STORM-650:
---

After a lot of thought and a romp through the kafka spout source code, I think 
we should consider an aggressive refactor/rewrite of the Kafka spout.

[~wurstmeister] has put together a pretty good list in agile story format that 
I think should be considered as a starting point.

The bottom line is that IMHO, first class Kafka support is pretty critical, and 
I don't think we're there. If users are asking the same questions over and 
over, something needs to be fixed either in code or documentation. I think it's 
both.



> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-03-12 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358844#comment-14358844
 ] 

Sriharsha Chintalapani commented on STORM-650:
--

Thanks for summarizing [~wurstmeister] .  There are lot of things here instead 
of getting this in as one patch can we do it progressively. 
We've a PR up for STORM-631 is it conflicting with where we want to go with 
kafka-connector If not we should get it in and open up additional jiras for 
further patches.
[~parth.brahmbhatt] [~ptgoetz] [~revans2] any thoughts . We are getting lot of 
queries around Kafka spout config IMO it definitely need refactoring I don't 
think this parent JIRA hold back on any progress.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-03-03 Thread Thomas Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345822#comment-14345822
 ] 

Thomas Becker commented on STORM-650:
-

I’ve tried to put the current suggestions into a story format that should help 
us create/update the appropriate tickets.
I've grouped the suggestions into things that affect the internals and things 
that affect users of the library. 

As a storm developer I’d like to use the Kafka Metadata API for broker 
management to avoid having to know kafka-internal zookeeper details 
(STORM-590/STORM-631) 
As a storm developer I’d like to use ITuple interface consistently to avoid 
duplication (STORM-631)
As a storm developer I’d like all loggers to be private (STORM-650)
As a storm developer I’d like consistent, structured exception handling 
(STORM-650)
As a storm developer I’d like to use the new kafka consumer API (0.8.3) to 
reduce dependencies and use long term supported kafka apis (STORM-650)
As a storm developer I’d like to use the new kafka producer API to reduce 
dependencies and use long term supported kafka apis (STORM-650)
As a storm developer I'd like to avoid having to use unnecessary marker 
interfaces (STORM-631) 


As an API client developer I’d like to be able to distinguish between internal 
and public APIs to avoid confusion (STORM-650)
As an API client developer I’d like to be able to select the starting point in 
kafka in an unambiguous way (STORM-563)
As an API client developer I’d like all public APIs to be documented (STORM-650)
As an API client developer I’d like to be able to use a pluggable failure 
handler (https://github.com/apache/storm/pull/406)
As an API client developer I’d like to be able to use a single way of 
configuring storm and trident kafka topologies (STORM-631)
As an API client developer I'd like the kafka related configuration to be 
immutable (STORM-650)
As an API client developer I’d like to be able to white list topics in the 
kafka spout. (STORM-650)
As an API client developer I’d like to know the offset and partition for a 
message so i can audit and replay messages (STORM-697) 

If we agree on this this list then I can go ahead and create/update the tickets 
so that we can move forward and start discussing further details in the 
individual tickets.

Thanks 

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-26 Thread Rick Kellogg (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339099#comment-14339099
 ] 

Rick Kellogg commented on STORM-650:


If and when we switch to the new consumer API, we need to be sure to document 
that existing offset values maintained within Zookeeper NOT be maintained.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-26 Thread Rick Kellogg (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338487#comment-14338487
 ] 

Rick Kellogg commented on STORM-650:


I would also like to see the following changes:

* Public versus internal classes should be isolated into another package, i.e. 
KafkaSpout versus PartitionManager.
* Instance variables inside of configuration classes (KafkaConfig, 
SpoutConfig), should be made private to enforce immutability.
* Logger instance variables should all be private NOT public.
* Review use of throw RuntimeException.  Error handling is quite poor in some 
circumstances.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334943#comment-14334943
 ] 

ASF GitHub Bot commented on STORM-650:
--

Github user ogorun commented on the pull request:

https://github.com/apache/storm/pull/406#issuecomment-75769012
  
Hi @d2r ,

Is it better to open separate JIRA issue or this pull request is considered 
as part of subjects discussed in  STORM-650?


> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-23 Thread Parth Brahmbhatt (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333726#comment-14333726
 ] 

Parth Brahmbhatt commented on STORM-650:


I don't think we are proposing to use high level consumer. Currently storm uses 
kafka's internal zk details to figure out a topic's partition -> List 
and leader broker mapping. STORM-650 is about changing that so we can use 
Kafka's admin API to get that information. The partition to consumer assignment 
is still handled by storm's own PartitionManager.



> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-23 Thread Nathan Marz (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333715#comment-14333715
 ] 

Nathan Marz commented on STORM-650:
---

The kafka consumer client wasn't used in the first place because it was too 
high level. The consumers would figure out amongst themselves who would handle 
which partitions. Storm needs to be able to ensure that one given task is 
always responsible for a particular kafka partition, and if that task 
dies/restarts it will *still* be responsible for that partition. You couldn't 
ensure that with the high level API. If there's some intermediary API now that 
lets us avoid assuming details of kafka's ZK format, I would be all for that. 

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-23 Thread Thomas Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333632#comment-14333632
 ] 

Thomas Becker commented on STORM-650:
-

Hi

I think the list of items in the description sounds like a great starting point 
for discussing improvements.
In order to make some progress I think it would be great if we could focus on 
one of the items first (I'd suggest to start with STORM-590) and then discuss 
further changes, e.g. configuration, error handling, etc. This should help kick 
start the discussion.


> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333501#comment-14333501
 ] 

ASF GitHub Bot commented on STORM-650:
--

Github user d2r commented on the pull request:

https://github.com/apache/storm/pull/387#issuecomment-75582274
  
Do we want to close this pull request for the time being and re-open after 
discussion in STORM-650?


> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-23 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333456#comment-14333456
 ] 

Sriharsha Chintalapani commented on STORM-650:
--

[~frickelnix] [~parth.brahmbhatt] .Since Parth have patch already up can we 
look at it and see if it addresses all the dependent JIRAs. 

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-13 Thread Thomas Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321022#comment-14321022
 ] 

Thomas Becker commented on STORM-650:
-

Hi Jay,

thanks for the feedback. I'll have a look at the new consumer API and will see 
how we could make it fit so that we can get an idea for a migration path.



> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-13 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321002#comment-14321002
 ] 

Jay Kreps commented on STORM-650:
-

Hey guys, from my pov the best thing to do would be to get on the new and 
improved Java APIs. We believe these should do what you want and should allow 
dramatically simplifying your code.

There are several incidental improvements you should get out of this: (1) no 
more scala dependency, (2) clients are in their own jar so no pulling in server 
deps, (3) richer api which should make reaching around to ZK unnecessary. These 
are intended to be the long term supported JVM apis.

The new producer is ready to go and is in the 0.8.2 release. It is protocol 
compatible with any 0.8.x release so depending on this doesn't require users to 
upgrade their Kafka installation. It's much much faster than the previous 
producer in general but especially when doing synchronous acknowledgements.

The new consumer is on trunk now and is beta quality. It does not yet include 
the regular expression support and it does not yet allow automatic partition 
balancing (that is pending server-side features). Both those are coming, though 
(but I don't think you guys do regex now and you do your own partition 
assignment anyway). The advantage of this new client for the consumer is it 
will give you full control over your offset, includes support for committing 
offsets, and gives you full control over partition assignment but doesn't 
require you to manually discover brokers and manage failover which is hyper 
error-prone. The timeline for a production-quality release is probably about 3 
months. 

The producer APIs are still changeable as this is pre-release, so if there is 
any gap in what you would need, now we be a fantastic time to flag it. You can 
see the new APIs here:
http://kafka.apache.org/083/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

If anyone would like to have a brief conversation about the APIs I'd be happy 
to do that sometime after next week.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-03 Thread Xavier Stevens (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303852#comment-14303852
 ] 

Xavier Stevens commented on STORM-650:
--

I think it would be useful if the kafka spout implementations supported topic 
whitelisting by regular expression topic matching. Kafka supports this in their 
high-level Consumer API.

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-03 Thread Olga Gorun (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302974#comment-14302974
 ] 

Olga Gorun commented on STORM-650:
--

Hi all,
Linking a variant of 'Improve Error Handling' implementation: 
https://github.com/apache/storm/pull/406

> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

2015-02-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302058#comment-14302058
 ] 

ASF GitHub Bot commented on STORM-650:
--

Github user ptgoetz commented on the pull request:

https://github.com/apache/storm/pull/387#issuecomment-72551762
  
STORM-650 created. Let's try to focus discussion/efforts there.


> Storm-Kafka Refactoring and Improvements
> 
>
> Key: STORM-650
> URL: https://issues.apache.org/jira/browse/STORM-650
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka
>Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)