Re: [PROPOSAL] Storm for Apache Incubator

Joe Stein Wed, 04 Sep 2013 19:40:27 -0700

sounds great

+1 to include storm-kafka under a contrib folder of the Apache Storm
project & for other modules (moving forward) based on community members
showing initiative in working on and maintaing them


On Wed, Sep 4, 2013 at 10:23 PM, Nathan Marz <nathan.m...@gmail.com> wrote:

> I think that storm-kafka would make sense as a contrib module since it's
> widely used. I'm not sure what to do with the other storm-contrib modules.
> I figure the less code that's part of the initial repo the better, because
> there will be less contribution/legal issues to sort out. How about this -
> we plan to include storm-kafka under a contrib folder of the Apache Storm
> project (just because a lot of people depend on it), and we can pull other
> storm-contrib modules in if community members show initiative in working on
> and maintaining them?
>
> If that all sounds good I'll update the proposal accordingly.
>
>
> On Sep 4, 2013, at 6:41 PM, Joe Stein <crypt...@gmail.com> wrote:
>
> > What does this mean for storm contribs (
> > https://github.com/nathanmarz/storm-contrib)? (spouts & bolts) e.g The
> > Apache Kafka spout already it is hard to know which to use and which is
> > best for 0.7.X and 0.8.X-betaX...  Is the Apache Storm project going to
> > help corral that or is it only for Storm core as the proposal implies
> with
> > only the storm code base https://github.com/nathanmarz/storm being part
> of
> > the project?
> >
> > A lot of traffic on the existing user list is about spouts (e.g. the
> Kafka
> > Spout) and I was not sure if that would still be talked about or funneled
> > somewhere else or what the thoughts/plans where for the parts built
> within
> > Storm that are existing now?
> >
> > /*******************************************
> > Joe Stein
> > Founder, Principal Consultant
> > Big Data Open Source Security LLC
> > http://www.stealth.ly
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> >
> > On Wed, Sep 4, 2013 at 4:34 PM, Nathan Marz <nat...@nathanmarz.com>
> wrote:
> >
> >> We definitely need a storm-user list as the existing google groups
> mailing
> >> list for Storm is quite active. So we'll need to transition that over. I
> >> agree on adding a storm-commits list and added it to the proposal.
> >>
> >>
> >> On Wed, Sep 4, 2013 at 11:50 AM, Henry Saputra <henry.sapu...@gmail.com
> >>> wrote:
> >>
> >>> Excited about Storm coming to Apache. Small comment about the mailing
> >> list,
> >>> you may want to propose having:
> >>> * storm-dev
> >>> * storm-commits
> >>> * storm-private (with moderated subscriptions)
> >>>
> >>> instead for starting into incubator.
> >>>
> >>> However, Storm has been a well known open source project, maybe it does
> >>> valid to have storm-user from the beginning. But I think you may need
> >>> storm-commits
> >>> list to separate commits log from dev discussions.
> >>> Mentors can chime in about this.
> >>>
> >>> Thanks,
> >>>
> >>> Henry
> >>>
> >>>
> >>>
> >>> On Wed, Sep 4, 2013 at 1:07 AM, Nathan Marz <nat...@nathanmarz.com>
> >> wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> I'd like to propose Storm to be an Apache Incubator project. After
> much
> >>>> thought I believe this is the right next step for the project, and I
> >> look
> >>>> forward to hearing everyone's thoughts and feedback!
> >>>>
> >>>> Here's a link to the proposal:
> >>>> https://wiki.apache.org/incubator/StormProposal
> >>>>
> >>>> The proposal is also pasted below.
> >>>>
> >>>> -Nathan
> >>>>
> >>>>
> >>>> = Storm Proposal =
> >>>>
> >>>> == Abstract ==
> >>>>
> >>>> Storm is a distributed, fault-tolerant, and high-performance realtime
> >>>> computation system that provides strong guarantees on the processing
> of
> >>>> data.
> >>>>
> >>>> == Proposal ==
> >>>>
> >>>> Storm is a distributed real-time computation system. Similar to how
> >>> Hadoop
> >>>> provides a set of general primitives for doing batch processing, Storm
> >>>> provides a set of general primitives for doing real-time computation.
> >> Its
> >>>> use cases span stream processing, distributed RPC, continuous
> >>> computation,
> >>>> and more. Storm has become a preferred technology for near-realtime
> >>>> big-data processing by many organizations worldwide (see a partial
> list
> >>> at
> >>>> https://github.com/nathanmarz/storm/wiki/Powered-By). As an open
> >> source
> >>>> project, Storm’s developer community has grown rapidly to 46 members.
> >>>>
> >>>> == Background ==
> >>>>
> >>>> The past decade has seen a revolution in data processing. MapReduce,
> >>>> Hadoop, and related technologies have made it possible to store and
> >>> process
> >>>> data at scales previously unthinkable. Unfortunately, these data
> >>> processing
> >>>> technologies are not realtime systems, nor are they meant to be. The
> >> lack
> >>>> of a "Hadoop of realtime" has become the biggest hole in the data
> >>>> processing ecosystem. Storm fills that hole.
> >>>>
> >>>> Storm was initially developed and deployed at BackType in 2011. After
> 7
> >>>> months of development BackType was acquired by Twitter in July 2011.
> >>> Storm
> >>>> was open sourced in September 2011.
> >>>>
> >>>> Storm has been under continuous development on its Github repository
> >>> since
> >>>> being open-sourced. It has undergone four major releases (0.5, 0.6,
> >> 0.7,
> >>>> 0.8) and many minor ones.
> >>>>
> >>>> == Rationale ==
> >>>>
> >>>> Storm is a general platform for low-latency big-data processing. It is
> >>>> complementary to the existing Apache projects, such as Hadoop. Many
> >>>> applications are actually exploring using both Hadoop and Storm for
> >>>> big-data processing. Bringing Storm into Apache is very beneficial to
> >>> both
> >>>> Apache community and Storm community.
> >>>>
> >>>> The rapid growth of Storm community is empowered by open source. We
> >>> believe
> >>>> the Apache foundation is a great fit as the long-term home for Storm,
> >> as
> >>> it
> >>>> provides an established process for community-driven development and
> >>>> decision making by consensus. This is exactly the model we want for
> >>> future
> >>>> Storm development.
> >>>>
> >>>> == Initial Goals ==
> >>>>
> >>>>  * Move the existing codebase to Apache
> >>>>  * Integrate with the Apache development process
> >>>>  * Ensure all dependencies are compliant with Apache License version
> >> 2.0
> >>>>  * Incremental development and releases per Apache guidelines
> >>>>
> >>>> == Current Status ==
> >>>>
> >>>> Storm has undergone four major releases (0.5, 0.6, 0.7, 0.8) and many
> >>> minor
> >>>> ones. Storm 0.9 is about to be released. Storm is being used in
> >>> production
> >>>> by over 50 organizations. Storm codebase is currently hosted at
> >>> github.com
> >>>> ,
> >>>> which will seed the Apache git repository.
> >>>>
> >>>> === Meritocracy ===
> >>>>
> >>>> We plan to invest in supporting a meritocracy. We will discuss the
> >>>> requirements in an open forum. Several companies have already
> expressed
> >>>> interest in this project, and we intend to invite additional
> developers
> >>> to
> >>>> participate. We will encourage and monitor community participation so
> >>> that
> >>>> privileges can be extended to those that contribute.
> >>>>
> >>>> === Community ===
> >>>>
> >>>> The need for a low-latency big-data processing platform in the open
> >>> source
> >>>> is tremendous. Storm is currently being used by at least 50
> >> organizations
> >>>> worldwide (see https://github.com/nathanmarz/storm/wiki/Powered-By),
> >> and
> >>>> is
> >>>> the most starred Java project on Github. By bringing Storm into
> Apache,
> >>> we
> >>>> believe that the community will grow even bigger.
> >>>>
> >>>> === Core Developers ===
> >>>>
> >>>> Storm was started by Nathan Marz at BackType, and now has developers
> >> from
> >>>> Yahoo!, Microsoft, Alibaba, Infochimps, and many other companies.
> >>>>
> >>>> === Alignment ===
> >>>>
> >>>> In the big-data processing ecosystem, Storm is a very popular
> >> low-latency
> >>>> platform, while Hadoop is the primary platform for batch processing.
> We
> >>>> believe that it will help the further growth of big-data community by
> >>>> having Hadoop and Storm aligned within Apache foundation. The
> alignment
> >>> is
> >>>> also beneficial to other Apache communities (such as Zookeeper,
> Thrift,
> >>>> Mesos). We could include additional sub-projects, Storm-on-YARN and
> >>>> Storm-on-Mesos, in the near future.
> >>>>
> >>>> == Known Risks ==
> >>>>
> >>>> === Orphaned Products ===
> >>>>
> >>>> The risk of the Storm project being abandoned is minimal. There are at
> >>>> least 50 organizations (Twitter, Yahoo!, Microsoft, Groupon, Baidu,
> >>>> Alibaba, Alipay, Taobao, PARC, RocketFuel etc) are highly incentivized
> >> to
> >>>> continue development. Many of these organizations have built critical
> >>>> business applications upon Storm, and have devoted significant
> internal
> >>>> infrastructure investment in Storm.
> >>>>
> >>>> === Inexperience with Open Source ===
> >>>>
> >>>> Storm has existed as a healthy open source project for several years.
> >>>> During that time, we have curated an open-source community
> >> successfully,
> >>>> attracting over 40 developers from a diverse group of companies
> >> including
> >>>> Twitter, Yahoo!, and Alibaba.
> >>>>
> >>>> === Homogenous Developers ===
> >>>>
> >>>> The initial committers are employed by large companies (including
> >>> Twitter,
> >>>> Yahoo!, Alibaba, Microsoft) and well-funded startups. Storm has an
> >> active
> >>>> community of developers, and we are committed to recruiting additional
> >>>> committers based on their contributions to the project.
> >>>>
> >>>> === Reliance on Salaried Developers ===
> >>>>
> >>>> It is expected that Storm development will occur on both salaried time
> >>> and
> >>>> on volunteer time, after hours. The majority of initial committers are
> >>> paid
> >>>> by their employer to contribute to this project. However, they are all
> >>>> passionate about the project, and we are confident that the project
> >> will
> >>>> continue even if no salaried developers contribute to the project. We
> >> are
> >>>> committed to recruiting additional committers including non-salaried
> >>>> developers.
> >>>>
> >>>> === Relationships with Other Apache Products ===
> >>>>
> >>>> As mentioned in the Alignment section, Storm is closely integrated
> with
> >>>> Hadoop,
> >>>> Zookeeper, Thrift, YARN and Mesos in a numerous ways. We look forward
> >> to
> >>>> collaborating with those communities, as well as other Apache
> >> communities
> >>>> (including Apache S4 which focuses on stateful low-latency
> processing).
> >>>>
> >>>> === An Excessive Fascination with the Apache Brand ===
> >>>>
> >>>> Storm is already a healthy and well known open source project. This
> >>>> proposal is not for the purpose of generating publicity. Rather, the
> >>>> primary benefits to joining Apache are those outlined in the Rationale
> >>>> section.
> >>>>
> >>>> == Documentation ==
> >>>>
> >>>> The reader will find these websites highly relevant:
> >>>>
> >>>>  * Storm website: http://storm-project.net
> >>>>  * Storm documentation: https://github.com/nathanmarz/storm/wiki
> >>>>  * Codebase: https://github.com/nathanmarz/storm
> >>>>  * User group: https://groups.google.com/group/storm-user
> >>>>
> >>>> == Source and Intellectual Property Submission Plan ==
> >>>>
> >>>> The Storm codebase is currently hosted on Github:
> >>>> https://github.com/nathanmarz/storm.
> >>>>
> >>>> This is the exact codebase that we would migrate to the Apache
> >>> foundation.
> >>>>
> >>>> The Storm source code is currently licensed under Eclipse Public
> >> License
> >>>> Version 1.0. Some source code was contributed under a contributor
> >>> agreement
> >>>> based on the Sun contributor agreement (v1.5). More recent code has
> >> been
> >>>> contributed under an Apache style agreement (see
> >>
> https://dl.dropboxusercontent.com/u/133901206/storm-apache-style-cla.txt
> >>> ).
> >>>>
> >>>> Upon entering Apache, Storm will migrate to an Apache License 2.0 with
> >>> all
> >>>> contributions licensed to the Apache Foundation. In certain cases
> where
> >>>> individuals or organizations hold copyright, we will ensure they grant
> >> a
> >>>> license to the Apache Foundation. Going forward, all commits will be
> >>>> licensed directly to the Apache foundation through our signed
> >> Individual
> >>>> Contributor License Agreements for all committers on the project.
> >>>>
> >>>> Yahoo! is also willing to move Storm-on-YARN code from github to be a
> >>>> subproject of Apache Storm project. Storm-on-YARN is currently
> licensed
> >>>> under Apache License 2.0 and receive contribution under Apache style
> >> CLA.
> >>>> Upon entering Apache, Yahoo! will sign over copyright to Apache
> >>> foundation.
> >>>>
> >>>> == External Dependencies ==
> >>>>
> >>>> To the best of our knowledge, all of Storm dependencies (except
> >> 0MQ/JMQ)
> >>>> are distributed under Apache compatible licenses. Upon acceptance to
> >> the
> >>>> incubator, we would begin a thorough analysis of all transitive
> >>>> dependencies to verify this fact and introduce license checking into
> >> the
> >>>> build and release process (for instance integrating Apache Rat).
> >>>>
> >>>> Storm has used 0MQ and JMQ as the default mechanism for internal
> >>> messaging
> >>>> layer, and 0MQ/JMQ is licensed under GNU Lesser General Public
> License.
> >>>> Recently, we have made Storm messaging layer pluggable, and plan to
> use
> >>>> Netty (which is licensed under Apache License v2) as our default
> >>> messaging
> >>>> plugin (while keep 0MQ as an optional plugin).
> >>>>
> >>>> == Cryptography ==
> >>>>
> >>>> We do not expect Storm to be a controlled export item due to the use
> of
> >>>> encryption.
> >>>>
> >>>> Storm enable encryptions via 2 plugins:
> >>>>
> >>>>  * SASL authentication plugins … Currently, we have provide “no-op”
> >>>> authentication and digest authentication. In near future, we will
> >>> introduce
> >>>> Kerberos authentication.
> >>>>  * Tuple payload serialization plugins … Storm provides plugins for
> >>>> plain-object serialization and blowfish encryption.
> >>>>
> >>>> == Required Resources ==
> >>>>
> >>>> === Mailing lists ===
> >>>>
> >>>> * storm-user
> >>>> * storm-dev
> >>>> * storm-private (with moderated subscriptions)
> >>>>
> >>>> === Subversion Directory ===
> >>>>
> >>>> Git is the preferred source control system: git://
> git.apache.org/storm
> >>>>
> >>>> === Issue Tracking ===
> >>>>
> >>>> JIRA Storm (STORM)
> >>>>
> >>>> == Initial Committers ==
> >>>>
> >>>>  * Nathan Marz <nathan at nathanmarz dot com>
> >>>>  * James Xu <xumingmingv at gmail dot com>
> >>>>  * Jason Jackson <jason at cvk dot ca>
> >>>>  * Andy Feng <afeng at yahoo-inc dot com>
> >>>>  * Flip Kromer  <flip at infochimps dot com>
> >>>>  * David Lao <davidlao at microsoft dot com>
> >>>>  * P. Taylor Goetz <ptgoetz at gmail dot com>
> >>>>
> >>>> == Affiliations ==
> >>>>
> >>>>  * Nathan Marz - Nathan’s Startup
> >>>>  * James Xu - Alibaba
> >>>>  * Jason Jackson - Twitter
> >>>>  * Andy Feng - Yahoo!
> >>>>  * Flip Kromer - Infochimps
> >>>>  * David Lao - Microsoft
> >>>>  * P. Taylor Goetz - Health Market Science
> >>>>
> >>>> == Sponsors ==
> >>>>
> >>>> === Champion ===
> >>>>
> >>>>  * Doug Cutting  <cutting at apache dot org>
> >>>>
> >>>> === Nominated Mentors ===
> >>>>
> >>>> * Ted Dunning <tdunning at maprtech.com>
> >>>> * Arvind Prabhaker <arvind at apache dot org>
> >>>> * Devaraj Das <ddas at hortonworks dot com>
> >>>>
> >>>> === Sponsoring Entity ===
> >>>>
> >>>> The Apache Incubator
> >>
> >>
> >>
> >> --
> >> Twitter: @nathanmarz
> >> http://nathanmarz.com
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [PROPOSAL] Storm for Apache Incubator

Reply via email to