I would like to see:
* Leader election
* Ability to balance channels across a cluster
* A discussion or design around better fault-tolerance: if a worker goes
down, how would we process its data elsewhere? (a short-term solution
could be adding Kafka-based channels)
* A discussion or design for balancing work across a cluster: pulling
data from HDFS has to be done in a single node, but if processors
supported some notion of pending work and could balance it across the
cluster, that would be helpful. For pulling from HDFS, that would be
listing the paths to process, then pulling them in parallel and marking
the path/task finished. This should be fault-tolerant so even if a node
goes down, another node does the work (otherwise we could just use a
simple partitioning scheme).
Obviously, these get complicated quick. But, I think some of these
features would really help adoption.
rb
On 05/14/2015 07:29 PM, Joe Witt wrote:
All,
With the 0.1.0 release hopefully soon available it is time to turn
towards the next release or so and get a sense of what we should focus
on.
This should include both 0.1.x but also 0.2.0.
Obviously first and foremost we need to work the existing PRs and
patches that exist.
Beyond that we have slated the following so far:
0.1.1
https://issues.apache.org/jira/browse/NIFI/fixforversion/12332286/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
0.2.0
https://issues.apache.org/jira/browse/NIFI/fixforversion/12329653/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
But what i'd like to throw out for general discussion is what are some
of the bigger thematic things we should focus on? Things which will
help further with community growth and utility of the product for
those folks using it now?
For example:
I'd like to see us start digging into the cluster robustness issues
(HA cluster manager w/ legit leader election, etc..). But there are
other things as well that may be more important sooner.
Please share your thoughts as this is a great time to effect those releases.
Thanks
Joe
--
Ryan Blue
Software Engineer
Cloudera, Inc.