[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

Nathan Marz (JIRA) Mon, 23 Feb 2015 11:47:56 -0800

    [ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333715#comment-14333715
 ]


Nathan Marz commented on STORM-650:
-----------------------------------

The kafka consumer client wasn't used in the first place because it was too 
high level. The consumers would figure out amongst themselves who would handle 
which partitions. Storm needs to be able to ensure that one given task is 
always responsible for a particular kafka partition, and if that task 
dies/restarts it will *still* be responsible for that partition. You couldn't 
ensure that with the high level API. If there's some intermediary API now that 
lets us avoid assuming details of kafka's ZK format, I would be all for that. 

> Storm-Kafka Refactoring and Improvements
> ----------------------------------------
>
>                 Key: STORM-650
>                 URL: https://issues.apache.org/jira/browse/STORM-650
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-kafka
>            Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

Reply via email to