Re: [VOTE] Accept Beam into the Apache Incubator

Edward J. Yoon Thu, 28 Jan 2016 15:10:26 -0800

+1 (binding).

On Fri, Jan 29, 2016 at 7:51 AM, Gregory Chase <gch...@pivotal.io> wrote:
> + 1 (non-binding), and cool name!
>
> On Thu, Jan 28, 2016 at 2:47 PM, Byung-Gon Chun <bgc...@gmail.com> wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> On Fri, Jan 29, 2016 at 5:31 AM, Adunuthula, Seshu <sadunuth...@ebay.com>
>> wrote:
>>
>> > +1 (non-binding)
>> >
>> > On 1/28/16, 12:05 PM, "Julian Hyde" <jh...@apache.org> wrote:
>> >
>> > >+1 (binding)
>> > >
>> > >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal <maban...@gmail.com>
>> wrote:
>> > >>
>> > >> +1 (non-binding)
>> > >>
>> > >> Thanks,
>> > >> Mayank
>> > >>
>> > >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh <
>> > >> venkat...@innerzeal.com> wrote:
>> > >>
>> > >>> +1 (binding).
>> > >>>
>> > >>> Thanks!
>> > >>>
>> > >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning <ted.dunn...@gmail.com>
>> > >>> wrote:
>> > >>>
>> > >>>> +1
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament
>> > >>>><johndam...@apache.org>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> +1
>> > >>>>>
>> > >>>>> On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré
>> > >>>>><j...@nanthrax.net>
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> Hi,
>> > >>>>>>
>> > >>>>>> the Beam proposal (initially Dataflow) was proposed last week.
>> > >>>>>>
>> > >>>>>> The complete discussion thread is available here:
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>
>> > >>>>
>> > >>>
>> > >>>
>> > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%
>> >
>> >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c
>> > >>>om%3E
>> > >>>>>>
>> > >>>>>> As reminder the BeamProposal is here:
>> > >>>>>>
>> > >>>>>> https://wiki.apache.org/incubator/BeamProposal
>> > >>>>>>
>> > >>>>>> Regarding all the great feedbacks we received on the mailing list,
>> > >>>>>>we
>> > >>>>>> think it's time to call a vote to accept Beam into the Incubator.
>> > >>>>>>
>> > >>>>>> Please cast your vote to:
>> > >>>>>> [] +1 - accept Apache Beam as a new incubating project
>> > >>>>>> []  0 - not sure
>> > >>>>>> [] -1 - do not accept the Apache Beam project (because: ...)
>> > >>>>>>
>> > >>>>>> Thanks,
>> > >>>>>> Regards
>> > >>>>>> JB
>> > >>>>>> ----
>> > >>>>>> ## page was renamed from DataflowProposal
>> > >>>>>> = Apache Beam =
>> > >>>>>>
>> > >>>>>> == Abstract ==
>> > >>>>>>
>> > >>>>>> Apache Beam is an open source, unified model and set of
>> > >>>>>> language-specific SDKs for defining and executing data processing
>> > >>>>>> workflows, and also data ingestion and integration flows,
>> supporting
>> > >>>>>> Enterprise Integration Patterns (EIPs) and Domain Specific
>> Languages
>> > >>>>>> (DSLs). Dataflow pipelines simplify the mechanics of large-scale
>> > >>> batch
>> > >>>>>> and streaming data processing and can run on a number of runtimes
>> > >>> like
>> > >>>>>> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
>> > >>>> service).
>> > >>>>>> Beam also brings DSL in different languages, allowing users to
>> > >>>>>>easily
>> > >>>>>> implement their data integration processes.
>> > >>>>>>
>> > >>>>>> == Proposal ==
>> > >>>>>>
>> > >>>>>> Beam is a simple, flexible, and powerful system for distributed
>> data
>> > >>>>>> processing at any scale. Beam provides a unified programming
>> model,
>> > >>>>>>a
>> > >>>>>> software development kit to define and construct data processing
>> > >>>>>> pipelines, and runners to execute Beam pipelines in several
>> runtime
>> > >>>>>> engines, like Apache Spark, Apache Flink, or Google Cloud
>> Dataflow.
>> > >>>> Beam
>> > >>>>>> can be used for a variety of streaming or batch data processing
>> > >>>>>>goals
>> > >>>>>> including ETL, stream analysis, and aggregate computation. The
>> > >>>>>> underlying programming model for Beam provides MapReduce-like
>> > >>>>>> parallelism, combined with support for powerful data windowing,
>> and
>> > >>>>>> fine-grained correctness control.
>> > >>>>>>
>> > >>>>>> == Background ==
>> > >>>>>>
>> > >>>>>> Beam started as a set of Google projects (Google Cloud Dataflow)
>> > >>>> focused
>> > >>>>>> on making data processing easier, faster, and less costly. The
>> Beam
>> > >>>>>> model is a successor to MapReduce, FlumeJava, and Millwheel inside
>> > >>>>>> Google and is focused on providing a unified solution for batch
>> and
>> > >>>>>> stream processing. These projects on which Beam is based have been
>> > >>>>>> published in several papers made available to the public:
>> > >>>>>>
>> > >>>>>>  * MapReduce - http://research.google.com/archive/mapreduce.html
>> > >>>>>>  * Dataflow model  -
>> > >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>> > >>>>>>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>> > >>>>>>  * MillWheel - http://research.google.com/pubs/pub41378.html
>> > >>>>>>
>> > >>>>>> Beam was designed from the start to provide a portable programming
>> > >>>>>> layer. When you define a data processing pipeline with the Beam
>> > >>> model,
>> > >>>>>> you are creating a job which is capable of being processed by any
>> > >>>> number
>> > >>>>>> of Beam processing engines. Several engines have been developed to
>> > >>> run
>> > >>>>>> Beam pipelines in other open source runtimes, including a Beam
>> > >>>>>>runner
>> > >>>>>> for Apache Flink and Apache Spark. There is also a ³direct
>> runner²,
>> > >>> for
>> > >>>>>> execution on the developer machine (mainly for dev/debug
>> purposes).
>> > >>>>>> Another runner allows a Beam program to run on a managed service,
>> > >>>> Google
>> > >>>>>> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
>> > >>>>>> already available on GitHub, and independent from the Google Cloud
>> > >>>>>> Dataflow service. Another Python SDK is currently in active
>> > >>>> development.
>> > >>>>>>
>> > >>>>>> In this proposal, the Beam SDKs, model, and a set of runners will
>> be
>> > >>>>>> submitted as an OSS project under the ASF. The runners which are a
>> > >>> part
>> > >>>>>> of this proposal include those for Spark (from Cloudera), Flink
>> > >>>>>>(from
>> > >>>>>> data Artisans), and local development (from Google); the Google
>> > >>>>>>Cloud
>> > >>>>>> Dataflow service runner is not included in this proposal. Further
>> > >>>>>> references to Beam will refer to the Dataflow model, SDKs, and
>> > >>> runners
>> > >>>>>> which are a part of this proposal (Apache Beam) only. The initial
>> > >>>>>> submission will contain the already-released Java SDK; Google
>> > >>>>>>intends
>> > >>>> to
>> > >>>>>> submit the Python SDK later in the incubation process. The Google
>> > >>> Cloud
>> > >>>>>> Dataflow service will continue to be one of many runners for Beam,
>> > >>>> built
>> > >>>>>> on Google Cloud Platform, to run Beam pipelines. Necessarily,
>> Cloud
>> > >>>>>> Dataflow will develop against the Apache project additions,
>> updates,
>> > >>>> and
>> > >>>>>> changes. Google Cloud Dataflow will become one user of Apache Beam
>> > >>> and
>> > >>>>>> will participate in the project openly and publicly.
>> > >>>>>>
>> > >>>>>> The Beam programming model has been designed with simplicity,
>> > >>>>>> scalability, and speed as key tenants. In the Beam model, you only
>> > >>> need
>> > >>>>>> to think about four top-level concepts when constructing your data
>> > >>>>>> processing job:
>> > >>>>>>
>> > >>>>>>  * Pipelines - The data processing job made of a series of
>> > >>>> computations
>> > >>>>>> including input, processing, and output
>> > >>>>>>  * PCollections - Bounded (or unbounded) datasets which represent
>> > >>> the
>> > >>>>>> input, intermediate and output data in pipelines
>> > >>>>>>  * PTransforms - A data processing step in a pipeline in which one
>> > >>> or
>> > >>>>>> more PCollections are an input and output
>> > >>>>>>  * I/O Sources and Sinks - APIs for reading and writing data which
>> > >>> are
>> > >>>>>> the roots and endpoints of the pipeline
>> > >>>>>>
>> > >>>>>> == Rationale ==
>> > >>>>>>
>> > >>>>>> With Google Dataflow, Google intended to develop a framework which
>> > >>>>>> allowed developers to be maximally productive in defining the
>> > >>>>>> processing, and then be able to execute the program at various
>> > >>>>>>levels
>> > >>>> of
>> > >>>>>> latency/cost/completeness without re-architecting or re-writing
>> it.
>> > >>>> This
>> > >>>>>> goal was informed by Google¹s past experience  developing several
>> > >>>>>> models, frameworks, and tools useful for large-scale and
>> distributed
>> > >>>>>> data processing. While Google has previously published papers
>> > >>>> describing
>> > >>>>>> some of its technologies, Google decided to take a different
>> > >>>>>>approach
>> > >>>>>> with Dataflow. Google open-sourced the SDK and model alongside
>> > >>>>>> commercialization of the idea and ahead of publishing papers on
>> the
>> > >>>>>> topic. As a result, a number of open source runtimes exist for
>> > >>>> Dataflow,
>> > >>>>>> such as the Apache Flink and Apache Spark runners.
>> > >>>>>>
>> > >>>>>> We believe that submitting Beam as an Apache project will provide
>> an
>> > >>>>>> immediate, worthwhile, and substantial contribution to the open
>> > >>> source
>> > >>>>>> community. As an incubating project, we believe Dataflow will
>> have a
>> > >>>>>> better opportunity to provide a meaningful contribution to OSS and
>> > >>> also
>> > >>>>>> integrate with other Apache projects.
>> > >>>>>>
>> > >>>>>> In the long term, we believe Beam can be a powerful abstraction
>> > >>>>>>layer
>> > >>>>>> for data processing. By providing an abstraction layer for data
>> > >>>>>> pipelines and processing, data workflows can be increasingly
>> > >>> portable,
>> > >>>>>> resilient to breaking changes in tooling, and compatible across
>> many
>> > >>>>>> execution engines, runtimes, and open source projects.
>> > >>>>>>
>> > >>>>>> == Initial Goals ==
>> > >>>>>>
>> > >>>>>> We are breaking our initial goals into immediate (< 2 months),
>> > >>>>>> short-term (2-4 months), and intermediate-term (> 4 months).
>> > >>>>>>
>> > >>>>>> Our immediate goals include the following:
>> > >>>>>>
>> > >>>>>>  * Plan for reconciling the Dataflow Java SDK and various runners
>> > >>> into
>> > >>>>>> one project
>> > >>>>>>  * Plan for refactoring the existing Java SDK for better
>> > >>> extensibility
>> > >>>>>> by SDK and runner writers
>> > >>>>>>  * Validating all dependencies are ASL 2.0 or compatible
>> > >>>>>>  * Understanding and adapting to the Apache development process
>> > >>>>>>
>> > >>>>>> Our short-term goals include:
>> > >>>>>>
>> > >>>>>>  * Moving the newly-merged lists, and build utilities to Apache
>> > >>>>>>  * Start refactoring codebase and move code to Apache Git repo
>> > >>>>>>  * Continue development of new features, functions, and fixes in
>> the
>> > >>>>>> Dataflow Java SDK, and Dataflow runners
>> > >>>>>>  * Cleaning up the Dataflow SDK sources and crafting a roadmap and
>> > >>>> plan
>> > >>>>>> for how to include new major ideas, modules, and runtimes
>> > >>>>>>  * Establishment of easy and clear build/test framework for
>> Dataflow
>> > >>>>>> and associated runtimes; creation of testing, rollback, and
>> > >>> validation
>> > >>>>>> policy
>> > >>>>>>  * Analysis and design for work needed to make Beam a better data
>> > >>>>>> processing abstraction layer for multiple open source frameworks
>> and
>> > >>>>>> environments
>> > >>>>>>
>> > >>>>>> Finally, we have a number of intermediate-term goals:
>> > >>>>>>
>> > >>>>>>  * Roadmapping, planning, and execution of integrations with other
>> > >>> OSS
>> > >>>>>> and non-OSS projects/products
>> > >>>>>>  * Inclusion of additional SDK for Python, which is under active
>> > >>>>>> development
>> > >>>>>>
>> > >>>>>> == Current Status ==
>> > >>>>>>
>> > >>>>>> === Meritocracy ===
>> > >>>>>>
>> > >>>>>> Dataflow was initially developed based on ideas from many
>> employees
>> > >>>>>> within Google. As an ASL OSS project on GitHub, the Dataflow SDK
>> has
>> > >>>>>> received contributions from data Artisans, Cloudera Labs, and
>> other
>> > >>>>>> individual developers. As a project under incubation, we are
>> > >>> committed
>> > >>>>>> to expanding our effort to build an environment which supports a
>> > >>>>>> meritocracy. We are focused on engaging the community and other
>> > >>> related
>> > >>>>>> projects for support and contributions. Moreover, we are committed
>> > >>>>>>to
>> > >>>>>> ensure contributors and committers to Dataflow come from a broad
>> mix
>> > >>> of
>> > >>>>>> organizations through a merit-based decision process during
>> > >>> incubation.
>> > >>>>>> We believe strongly in the Beam model and are committed to growing
>> > >>>>>>an
>> > >>>>>> inclusive community of Beam contributors.
>> > >>>>>>
>> > >>>>>> === Community ===
>> > >>>>>>
>> > >>>>>> The core of the Dataflow Java SDK has been developed by Google for
>> > >>> use
>> > >>>>>> with Google Cloud Dataflow. Google has active community engagement
>> > >>>>>>in
>> > >>>>>> the SDK GitHub repository
>> > >>>>>> (https://github.com/GoogleCloudPlatform/DataflowJavaSDK), on
>> Stack
>> > >>>>>> Overflow
>> > >>>>>> (http://stackoverflow.com/questions/tagged/google-cloud-dataflow)
>> > >>> and
>> > >>>>>> has had contributions from a number of organizations and
>> > >>>>>>indivuduals.
>> > >>>>>>
>> > >>>>>> Everyday, Cloud Dataflow is actively used by a number of
>> > >>> organizations
>> > >>>>>> and institutions for batch and stream processing of data. We
>> believe
>> > >>>>>> acceptance will allow us to consolidate existing Dataflow-related
>> > >>> work,
>> > >>>>>> grow the Dataflow community, and deepen connections between
>> Dataflow
>> > >>>> and
>> > >>>>>> other open source projects.
>> > >>>>>>
>> > >>>>>> === Core Developers ===
>> > >>>>>>
>> > >>>>>> The core developers for Dataflow and the Dataflow runners are:
>> > >>>>>>
>> > >>>>>>  * Frances Perry
>> > >>>>>>  * Tyler Akidau
>> > >>>>>>  * Davor Bonaci
>> > >>>>>>  * Luke Cwik
>> > >>>>>>  * Ben Chambers
>> > >>>>>>  * Kenn Knowles
>> > >>>>>>  * Dan Halperin
>> > >>>>>>  * Daniel Mills
>> > >>>>>>  * Mark Shields
>> > >>>>>>  * Craig Chambers
>> > >>>>>>  * Maximilian Michels
>> > >>>>>>  * Tom White
>> > >>>>>>  * Josh Wills
>> > >>>>>>  * Robert Bradshaw
>> > >>>>>>
>> > >>>>>> === Alignment ===
>> > >>>>>>
>> > >>>>>> The Beam SDK can be used to create Beam pipelines which can be
>> > >>> executed
>> > >>>>>> on Apache Spark or Apache Flink. Beam is also related to other
>> > >>>>>>Apache
>> > >>>>>> projects, such as Apache Crunch. We plan on expanding
>> functionality
>> > >>> for
>> > >>>>>> Beam runners, support for additional domain specific languages,
>> and
>> > >>>>>> increased portability so Beam is a powerful abstraction layer for
>> > >>> data
>> > >>>>>> processing.
>> > >>>>>>
>> > >>>>>> == Known Risks ==
>> > >>>>>>
>> > >>>>>> === Orphaned Products ===
>> > >>>>>>
>> > >>>>>> The Dataflow SDK is presently used by several organizations, from
>> > >>> small
>> > >>>>>> startups to Fortune 100 companies, to construct production
>> pipelines
>> > >>>>>> which are executed in Google Cloud Dataflow. Google has a
>> long-term
>> > >>>>>> commitment to advance the Dataflow SDK; moreover, Dataflow is
>> seeing
>> > >>>>>> increasing interest, development, and adoption from organizations
>> > >>>>>> outside of Google.
>> > >>>>>>
>> > >>>>>> === Inexperience with Open Source ===
>> > >>>>>>
>> > >>>>>> Google believes strongly in open source and the exchange of
>> > >>> information
>> > >>>>>> to advance new ideas and work. Examples of this commitment are
>> > >>>>>>active
>> > >>>>>> OSS projects such as Chromium (https://www.chromium.org) and
>> > >>>> Kubernetes
>> > >>>>>> (http://kubernetes.io/). With Dataflow, we have tried to be
>> > >>>> increasingly
>> > >>>>>> open and forward-looking; we have published a paper in the VLDB
>> > >>>>>> conference describing the Dataflow model
>> > >>>>>> (http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf) and were quick
>> to
>> > >>>>>> release the Dataflow SDK as open source software with the launch
>> of
>> > >>>>>> Cloud Dataflow. Our submission to the Apache Software Foundation
>> is
>> > >>>>>>a
>> > >>>>>> logical extension of our commitment to open source software.
>> > >>>>>>
>> > >>>>>> === Homogeneous Developers ===
>> > >>>>>>
>> > >>>>>> The majority of committers in this proposal belong to Google due
>> to
>> > >>> the
>> > >>>>>> fact that Dataflow has emerged from several internal Google
>> > >>>>>>projects.
>> > >>>>>> This proposal also includes committers outside of Google who are
>> > >>>>>> actively involved with other Apache projects, such as Hadoop,
>> Flink,
>> > >>>> and
>> > >>>>>> Spark.  We expect our entry into incubation will allow us to
>> expand
>> > >>> the
>> > >>>>>> number of individuals and organizations participating in Dataflow
>> > >>>>>> development. Additionally, separation of the Dataflow SDK from
>> > >>>>>>Google
>> > >>>>>> Cloud Dataflow allows us to focus on the open source SDK and model
>> > >>> and
>> > >>>>>> do what is best for this project.
>> > >>>>>>
>> > >>>>>> === Reliance on Salaried Developers ===
>> > >>>>>>
>> > >>>>>> The Dataflow SDK and Dataflow runners have been developed
>> primarily
>> > >>> by
>> > >>>>>> salaried developers supporting the Google Cloud Dataflow project.
>> > >>> While
>> > >>>>>> the Dataflow SDK and Cloud Dataflow have been developed by
>> different
>> > >>>>>> teams (and this proposal would reinforce that separation) we
>> expect
>> > >>> our
>> > >>>>>> initial set of developers will still primarily be salaried.
>> > >>>> Contribution
>> > >>>>>> has not been exclusively from salaried developers, however. For
>> > >>>> example,
>> > >>>>>> the contrib directory of the Dataflow SDK
>> > >>>>>> (
>> > >>>>>>
>> > >>>>>
>> > >>>>
>> > >>>
>> > >>>
>> > https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/master/contr
>> > >>>ib
>> > >>>>>> )
>> > >>>>>> contains items from free-time contributors. Moreover, seperate
>> > >>>> projects,
>> > >>>>>> such as ScalaFlow (https://github.com/darkjh/scalaflow) have been
>> > >>>>>> created around the Dataflow model and SDK. We expect our reliance
>> on
>> > >>>>>> salaried developers will decrease over time during incubation.
>> > >>>>>>
>> > >>>>>> === Relationship with other Apache products ===
>> > >>>>>>
>> > >>>>>> Dataflow directly interoperates with or utilizes several existing
>> > >>>> Apache
>> > >>>>>> projects.
>> > >>>>>>
>> > >>>>>>  * Build
>> > >>>>>>   * Apache Maven
>> > >>>>>>  * Data I/O, Libraries
>> > >>>>>>   * Apache Avro
>> > >>>>>>   * Apache Commons
>> > >>>>>>  * Dataflow runners
>> > >>>>>>   * Apache Flink
>> > >>>>>>   * Apache Spark
>> > >>>>>>
>> > >>>>>> Beam when used in batch mode shares similarities with Apache
>> Crunch;
>> > >>>>>> however, Beam is focused on a model, SDK, and abstraction layer
>> > >>> beyond
>> > >>>>>> Spark and Hadoop (MapReduce.) One key goal of Beam is to provide
>> an
>> > >>>>>> intermediate abstraction layer which can easily be implemented and
>> > >>>>>> utilized across several different processing frameworks.
>> > >>>>>>
>> > >>>>>> === An excessive fascination with the Apache brand ===
>> > >>>>>>
>> > >>>>>> With this proposal we are not seeking attention or publicity.
>> > >>>>>>Rather,
>> > >>>> we
>> > >>>>>> firmly believe in the Beam model, SDK, and the ability to make
>> Beam
>> > >>>>>>a
>> > >>>>>> powerful yet simple framework for data processing. While the
>> > >>>>>>Dataflow
>> > >>>>>> SDK and model have been open source, we believe putting code on
>> > >>> GitHub
>> > >>>>>> can only go so far. We see the Apache community, processes, and
>> > >>> mission
>> > >>>>>> as critical for ensuring the Beam SDK and model are truly
>> > >>>>>> community-driven, positively impactful, and innovative open source
>> > >>>>>> software. While Google has taken a number of steps to advance its
>> > >>>>>> various open source projects, we believe Beam is a great fit for
>> the
>> > >>>>>> Apache Software Foundation due to its focus on data processing and
>> > >>> its
>> > >>>>>> relationships to existing ASF projects.
>> > >>>>>>
>> > >>>>>> == Documentation ==
>> > >>>>>>
>> > >>>>>> The following documentation is relevant to this proposal. Relevant
>> > >>>>>> portion of the documentation will be contributed to the Apache
>> Beam
>> > >>>>>> project.
>> > >>>>>>
>> > >>>>>>  * Dataflow website: https://cloud.google.com/dataflow
>> > >>>>>>  * Dataflow programming model:
>> > >>>>>> https://cloud.google.com/dataflow/model/programming-model
>> > >>>>>>  * Codebases
>> > >>>>>>   * Dataflow Java SDK:
>> > >>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK
>> > >>>>>>   * Flink Dataflow runner:
>> > >>>>> https://github.com/dataArtisans/flink-dataflow
>> > >>>>>>   * Spark Dataflow runner:
>> > >>> https://github.com/cloudera/spark-dataflow
>> > >>>>>>  * Dataflow Java SDK issue tracker:
>> > >>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues
>> > >>>>>>  * google-cloud-dataflow tag on Stack Overflow:
>> > >>>>>> http://stackoverflow.com/questions/tagged/google-cloud-dataflow
>> > >>>>>>
>> > >>>>>> == Initial Source ==
>> > >>>>>>
>> > >>>>>> The initial source for Beam which we will submit to the Apache
>> > >>>>>> Foundation will include several related projects which are
>> currently
>> > >>>>>> hosted on the GitHub repositories:
>> > >>>>>>
>> > >>>>>>  * Dataflow Java SDK
>> > >>>>>> (https://github.com/GoogleCloudPlatform/DataflowJavaSDK)
>> > >>>>>>  * Flink Dataflow runner (
>> > >>>>> https://github.com/dataArtisans/flink-dataflow)
>> > >>>>>>  * Spark Dataflow runner (
>> > >>> https://github.com/cloudera/spark-dataflow)
>> > >>>>>>
>> > >>>>>> These projects have always been Apache 2.0 licensed. We intend to
>> > >>>> bundle
>> > >>>>>> all of these repositories since they are all complimentary and
>> > >>>>>>should
>> > >>>> be
>> > >>>>>> maintained in one project. Prior to our submission, we will
>> combine
>> > >>> all
>> > >>>>>> of these projects into a new git repository.
>> > >>>>>>
>> > >>>>>> == Source and Intellectual Property Submission Plan ==
>> > >>>>>>
>> > >>>>>> The source for the Dataflow SDK and the three runners (Spark,
>> Flink,
>> > >>>>>> Google Cloud Dataflow) are already licensed under an Apache 2
>> > >>> license.
>> > >>>>>>
>> > >>>>>>  * Dataflow SDK -
>> > >>>>>>
>> > >>>>>
>> > >>>>
>> > >>>
>> > >>>
>> > https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/LICEN
>> > >>>SE
>> > >>>>>>  * Flink runner -
>> > >>>>>>
>> https://github.com/dataArtisans/flink-dataflow/blob/master/LICENSE
>> > >>>>>>  * Spark runner -
>> > >>>>>> https://github.com/cloudera/spark-dataflow/blob/master/LICENSE
>> > >>>>>>
>> > >>>>>> Contributors to the Dataflow SDK have also signed the Google
>> > >>> Individual
>> > >>>>>> Contributor License Agreement
>> > >>>>>> (https://cla.developers.google.com/about/google-individual) in
>> > order
>> > >>>> to
>> > >>>>>> contribute to the project.
>> > >>>>>>
>> > >>>>>> With respect to trademark rights, Google does not hold a trademark
>> > >>>>>>on
>> > >>>>>> the phrase ³Dataflow.² Based on feedback and guidance we receive
>> > >>> during
>> > >>>>>> the incubation process, we are open to renaming the project if
>> > >>>> necessary
>> > >>>>>> for trademark or other concerns.
>> > >>>>>>
>> > >>>>>> == External Dependencies ==
>> > >>>>>>
>> > >>>>>> All external dependencies are licensed under an Apache 2.0 or
>> > >>>>>> Apache-compatible license. As we grow the Beam community we will
>> > >>>>>> configure our build process to require and validate all
>> > >>>>>>contributions
>> > >>>>>> and dependencies are licensed under the Apache 2.0 license or are
>> > >>> under
>> > >>>>>> an Apache-compatible license.
>> > >>>>>>
>> > >>>>>> == Required Resources ==
>> > >>>>>>
>> > >>>>>> === Mailing Lists ===
>> > >>>>>>
>> > >>>>>> We currently use a mix of mailing lists. We will migrate our
>> > >>>>>>existing
>> > >>>>>> mailing lists to the following:
>> > >>>>>>
>> > >>>>>>  * d...@beam.incubator.apache.org
>> > >>>>>>  * u...@beam.incubator.apache.org
>> > >>>>>>  * priv...@beam.incubator.apache.org
>> > >>>>>>  * comm...@beam.incubator.apache.org
>> > >>>>>>
>> > >>>>>> === Source Control ===
>> > >>>>>>
>> > >>>>>> The Dataflow team currently uses Git and would like to continue to
>> > >>>>>>do
>> > >>>>>> so. We request a Git repository for Beam with mirroring to GitHub
>> > >>>>> enabled.
>> > >>>>>>
>> > >>>>>>  * https://git-wip-us.apache.org/repos/asf/incubator-beam.git
>> > >>>>>>
>> > >>>>>> === Issue Tracking ===
>> > >>>>>>
>> > >>>>>> We request the creation of an Apache-hosted JIRA. The Dataflow
>> > >>> project
>> > >>>>>> is currently using both a public GitHub issue tracker and internal
>> > >>>>>> Google issue tracking. We will migrate and combine from these two
>> > >>>>>> sources to the Apache JIRA.
>> > >>>>>>
>> > >>>>>>  * Jira ID: BEAM
>> > >>>>>>
>> > >>>>>> == Initial Committers ==
>> > >>>>>>
>> > >>>>>>  * Aljoscha Krettek        [aljos...@apache.org]
>> > >>>>>>  * Amit Sela               [amitsel...@gmail.com]
>> > >>>>>>  * Ben Chambers            [bchamb...@google.com]
>> > >>>>>>  * Craig Chambers          [chamb...@google.com]
>> > >>>>>>  * Dan Halperin            [dhalp...@google.com]
>> > >>>>>>  * Davor Bonaci            [da...@google.com]
>> > >>>>>>  * Frances Perry           [f...@google.com]
>> > >>>>>>  * James Malone            [jamesmal...@google.com]
>> > >>>>>>  * Jean-Baptiste Onofré    [jbono...@apache.org]
>> > >>>>>>  * Josh Wills              [jwi...@apache.org]
>> > >>>>>>  * Kostas Tzoumas          [kos...@data-artisans.com]
>> > >>>>>>  * Kenneth Knowles         [k...@google.com]
>> > >>>>>>  * Luke Cwik               [lc...@google.com]
>> > >>>>>>  * Maximilian Michels      [m...@apache.org]
>> > >>>>>>  * Stephan Ewen            [step...@data-artisans.com]
>> > >>>>>>  * Tom White               [t...@cloudera.com]
>> > >>>>>>  * Tyler Akidau            [taki...@google.com]
>> > >>>>>>  * Robert Bradshaw         [rober...@google.com]
>> > >>>>>>
>> > >>>>>> == Additional Interested Contributors ==
>> > >>>>>>
>> > >>>>>>  * Debo Dutta              [dedu...@cisco.com]
>> > >>>>>>  * Henry Saputra           [hsapu...@apache.org]
>> > >>>>>>  * Taylor Goetz            [ptgo...@gmail.com]
>> > >>>>>>  * James Carman            [ja...@carmanconsulting.com]
>> > >>>>>>  * Joe Witt                [joew...@apache.org]
>> > >>>>>>  * Vaibhav Gumashta        [vgumas...@hortonworks.com]
>> > >>>>>>  * Prasanth Jayachandran   [pjayachand...@hortonworks.com]
>> > >>>>>>  * Johan Edstrom           [seij...@gmail.com]
>> > >>>>>>  * Hugo Louro              [hmclo...@gmail.com]
>> > >>>>>>  * Krzysztof Sobkowiak     [krzys.sobkow...@gmail.com]
>> > >>>>>>  * Jeff Genender           [jgenen...@apache.org]
>> > >>>>>>  * Edward J. Yoon          [edward.y...@samsung.com]
>> > >>>>>>  * Hao Chen                [h...@apache.org]
>> > >>>>>>  * Byung-Gon Chun          [bgc...@gmail.com]
>> > >>>>>>  * Charitha Elvitigala     [charit...@apache.org]
>> > >>>>>>  * Alexander Bezzubov      [b...@apache.org]
>> > >>>>>>  * Tsuyoshi Ozawa          [oz...@apache.org]
>> > >>>>>>  * Mayank Bansal           [maban...@gmail.com]
>> > >>>>>>  * Supun Kamburugamuve     [su...@apache.org]
>> > >>>>>>  * Matthias Wessendorf     [mat...@apache.org]
>> > >>>>>>  * Felix Cheung            [felixche...@apache.org]
>> > >>>>>>  * Ajay Yadava             [ajay.ya...@inmobi.com]
>> > >>>>>>  * Liang Chen              [chenliang...@huawei.com]
>> > >>>>>>  * Renaud Richardet        [renaud (at) apache (dot) org]
>> > >>>>>>  * Bakey Pan               [bakey1...@gmail.com]
>> > >>>>>>  * Andreas Neumann         [a...@apache.org]
>> > >>>>>>  * Suresh Marru            [sma...@apache.org]
>> > >>>>>>  * Hadrian Zbarcea         [hzbar...@gmail.com]
>> > >>>>>>
>> > >>>>>> == Affiliations ==
>> > >>>>>>
>> > >>>>>> The initial committers are from six organizations. Google
>> developed
>> > >>>>>> Dataflow and the Dataflow SDK, data Artisans developed the Flink
>> > >>>> runner,
>> > >>>>>> and Cloudera (Labs) developed the Spark runner.
>> > >>>>>>
>> > >>>>>>  * Cloudera
>> > >>>>>>   * Tom White
>> > >>>>>>  * Data Artisans
>> > >>>>>>   * Aljoscha Krettek
>> > >>>>>>   * Kostas Tzoumas
>> > >>>>>>   * Maximilian Michels
>> > >>>>>>   * Stephan Ewen
>> > >>>>>>  * Google
>> > >>>>>>   * Ben Chambers
>> > >>>>>>   * Dan Halperin
>> > >>>>>>   * Davor Bonaci
>> > >>>>>>   * Frances Perry
>> > >>>>>>   * James Malone
>> > >>>>>>   * Kenneth Knowles
>> > >>>>>>   * Luke Cwik
>> > >>>>>>   * Tyler Akidau
>> > >>>>>>   * Robert Bradshaw
>> > >>>>>>  * PayPal
>> > >>>>>>   * Amit Sela
>> > >>>>>>  * Slack
>> > >>>>>>   * Josh Wills
>> > >>>>>>  * Talend
>> > >>>>>>   * Jean-Baptiste Onofré
>> > >>>>>>
>> > >>>>>> == Sponsors ==
>> > >>>>>>
>> > >>>>>> === Champion ===
>> > >>>>>>
>> > >>>>>>  * Jean-Baptiste Onofre         [jbono...@apache.org]
>> > >>>>>>
>> > >>>>>> === Nominated Mentors ===
>> > >>>>>>
>> > >>>>>>  * Jean-Baptiste Onofre       [jbono...@apache.org]
>> > >>>>>>  * Jim Jagielski              [j...@apache.org]
>> > >>>>>>  * Venkatesh Seetharam        [venkat...@apache.org]
>> > >>>>>>  * Bertrand Delacretaz        [bdelacre...@apache.org]
>> > >>>>>>  * Ted Dunning                [tdunn...@apache.org]
>> > >>>>>>
>> > >>>>>> === Sponsoring Entity ===
>> > >>>>>>
>> > >>>>>> The Apache Incubator
>> > >>>>>> ----
>> > >>>>>>
>> > >>>>>>
>> >
>> >>>>>>---------------------------------------------------------------------
>> > >>>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> > >>>>>> For additional commands, e-mail:
>> general-h...@incubator.apache.org
>> > >>>>>>
>> > >>>>>>
>> > >>>>>
>> > >>>>
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Thanks and Regards,
>> > >> Mayank
>> > >> Cell: 408-718-9370
>> > >
>> > >
>> > >---------------------------------------------------------------------
>> > >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> > >For additional commands, e-mail: general-h...@incubator.apache.org
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> > For additional commands, e-mail: general-h...@incubator.apache.org
>> >
>> >
>>
>>
>> --
>> Byung-Gon Chun
>>
>
>
>
> --
> Greg Chase
>
> Director of Big Data Communities
> http://www.pivotal.io/big-data
>
> Pivotal Software
> http://www.pivotal.io/
>
> 650-215-0477
> @GregChase
> Blog: http://geekmarketing.biz/




-- 
Best Regards, Edward J. Yoon

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [VOTE] Accept Beam into the Apache Incubator

Reply via email to