[RESULT][VOTE] Accept Beam into the Apache Incubator

2016-02-01 Thread Jean-Baptiste Onofré

Hi all,

this vote passed with the following result:

+1 (binding): Jean-Baptiste Onofré, Bertrand Delacretaz, Sergio 
Fernandez, Henry Saputra, Taylor Goetz, Jim Jagielski, Suresh Marru, 
Daniel Kulp, Chris Nauroth, James Taylor, Greg Stein, John D. Ament, Ted 
Dunning, Venkatesh Seetharam, Julian Hyde, Edward J. Yoon, Hadrian 
Zbarcea, Amareshwari Sriramdasu, Olivier Lamy, Tom White, Tom Barber
+1 (non binding): Joe Witt, Alexander Bezzubov, Aljoscha Krettek, 
Krzysztof Sobkowiak, Felix Cheung, Supun Kamburugamuva, Prasanth 
Jayachandran, Ashish, Markus Geiss, Andreas Neumann, Mayank Bansal, 
Seshu Adunuthula, Byung-Gon Chun, Gregory Chase, Li Yang, Philip Rhodes, 
Naresh Agarwal, Johan Edstrom, Tsuyoshi Ozawa, Hao Chen, Renaud 
Richardet, Luke Han, Libin Sun

0:
-1:

It means 44 +1 (21 binding, 23 non binding), no 0, no -1.

Congrats to the Beam (aka dataflow) community and welcome to the ASF !
I will now work with infra to get the resources for the project.

Thanks all for your vote.

Regards
JB

On 01/28/2016 03:28 PM, Jean-Baptiste Onofré wrote:

Hi,

the Beam proposal (initially Dataflow) was proposed last week.

The complete discussion thread is available here:

http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E


As reminder the BeamProposal is here:

https://wiki.apache.org/incubator/BeamProposal

Regarding all the great feedbacks we received on the mailing list, we
think it's time to call a vote to accept Beam into the Incubator.

Please cast your vote to:
[] +1 - accept Apache Beam as a new incubating project
[]  0 - not sure
[] -1 - do not accept the Apache Beam project (because: ...)

Thanks,
Regards
JB

## page was renamed from DataflowProposal
= Apache Beam =

== Abstract ==

Apache Beam is an open source, unified model and set of
language-specific SDKs for defining and executing data processing
workflows, and also data ingestion and integration flows, supporting
Enterprise Integration Patterns (EIPs) and Domain Specific Languages
(DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
and streaming data processing and can run on a number of runtimes like
Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
Beam also brings DSL in different languages, allowing users to easily
implement their data integration processes.

== Proposal ==

Beam is a simple, flexible, and powerful system for distributed data
processing at any scale. Beam provides a unified programming model, a
software development kit to define and construct data processing
pipelines, and runners to execute Beam pipelines in several runtime
engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam
can be used for a variety of streaming or batch data processing goals
including ETL, stream analysis, and aggregate computation. The
underlying programming model for Beam provides MapReduce-like
parallelism, combined with support for powerful data windowing, and
fine-grained correctness control.

== Background ==

Beam started as a set of Google projects (Google Cloud Dataflow) focused
on making data processing easier, faster, and less costly. The Beam
model is a successor to MapReduce, FlumeJava, and Millwheel inside
Google and is focused on providing a unified solution for batch and
stream processing. These projects on which Beam is based have been
published in several papers made available to the public:

  * MapReduce - http://research.google.com/archive/mapreduce.html
  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
  * FlumeJava - http://research.google.com/pubs/pub35650.html
  * MillWheel - http://research.google.com/pubs/pub41378.html

Beam was designed from the start to provide a portable programming
layer. When you define a data processing pipeline with the Beam model,
you are creating a job which is capable of being processed by any number
of Beam processing engines. Several engines have been developed to run
Beam pipelines in other open source runtimes, including a Beam runner
for Apache Flink and Apache Spark. There is also a “direct runner”, for
execution on the developer machine (mainly for dev/debug purposes).
Another runner allows a Beam program to run on a managed service, Google
Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
already available on GitHub, and independent from the Google Cloud
Dataflow service. Another Python SDK is currently in active development.

In this proposal, the Beam SDKs, model, and a set of runners will be
submitted as an OSS project under the ASF. The runners which are a part
of this proposal include those for Spark (from Cloudera), Flink (from
data Artisans), and local development (from Google); the Google Cloud
Dataflow service runner is not included in this proposal. Further
references to Beam will refer to the Dataflow model, SDKs, and runners
which are a part of this proposal (Apache Beam) only.

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-30 Thread Libin Sun
+1 (non-binding)

2016-01-29 23:51 GMT+08:00 Luke Han :

> +1 (non-binding)
>
> On Fri, Jan 29, 2016 at 6:31 PM, Tom Barber 
> wrote:
>
> > +1 (binding)
> >
> > On Fri, Jan 29, 2016 at 10:03 AM, Tom White 
> wrote:
> >
> > > +1 (binding)
> > >
> > > Tom
> > >
> > > On Thu, Jan 28, 2016 at 2:28 PM, Jean-Baptiste Onofré  >
> > > wrote:
> > > > Hi,
> > > >
> > > > the Beam proposal (initially Dataflow) was proposed last week.
> > > >
> > > > The complete discussion thread is available here:
> > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> > > >
> > > > As reminder the BeamProposal is here:
> > > >
> > > > https://wiki.apache.org/incubator/BeamProposal
> > > >
> > > > Regarding all the great feedbacks we received on the mailing list, we
> > > think
> > > > it's time to call a vote to accept Beam into the Incubator.
> > > >
> > > > Please cast your vote to:
> > > > [] +1 - accept Apache Beam as a new incubating project
> > > > []  0 - not sure
> > > > [] -1 - do not accept the Apache Beam project (because: ...)
> > > >
> > > > Thanks,
> > > > Regards
> > > > JB
> > > > 
> > > > ## page was renamed from DataflowProposal
> > > > = Apache Beam =
> > > >
> > > > == Abstract ==
> > > >
> > > > Apache Beam is an open source, unified model and set of
> > language-specific
> > > > SDKs for defining and executing data processing workflows, and also
> > data
> > > > ingestion and integration flows, supporting Enterprise Integration
> > > Patterns
> > > > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines
> > simplify
> > > the
> > > > mechanics of large-scale batch and streaming data processing and can
> > run
> > > on
> > > > a number of runtimes like Apache Flink, Apache Spark, and Google
> Cloud
> > > > Dataflow (a cloud service). Beam also brings DSL in different
> > languages,
> > > > allowing users to easily implement their data integration processes.
> > > >
> > > > == Proposal ==
> > > >
> > > > Beam is a simple, flexible, and powerful system for distributed data
> > > > processing at any scale. Beam provides a unified programming model, a
> > > > software development kit to define and construct data processing
> > > pipelines,
> > > > and runners to execute Beam pipelines in several runtime engines,
> like
> > > > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be
> used
> > > for a
> > > > variety of streaming or batch data processing goals including ETL,
> > stream
> > > > analysis, and aggregate computation. The underlying programming model
> > for
> > > > Beam provides MapReduce-like parallelism, combined with support for
> > > powerful
> > > > data windowing, and fine-grained correctness control.
> > > >
> > > > == Background ==
> > > >
> > > > Beam started as a set of Google projects (Google Cloud Dataflow)
> > focused
> > > on
> > > > making data processing easier, faster, and less costly. The Beam
> model
> > > is a
> > > > successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> > > > focused on providing a unified solution for batch and stream
> > processing.
> > > > These projects on which Beam is based have been published in several
> > > papers
> > > > made available to the public:
> > > >
> > > >  * MapReduce - http://research.google.com/archive/mapreduce.html
> > > >  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> > > >  * FlumeJava - http://research.google.com/pubs/pub35650.html
> > > >  * MillWheel - http://research.google.com/pubs/pub41378.html
> > > >
> > > > Beam was designed from the start to provide a portable programming
> > layer.
> > > > When you define a data processing pipeline with the Beam model, you
> are
> > > > creating a job which is capable of being processed by any number of
> > Beam
> > > > processing engines. Several engines have been developed to run Beam
> > > > pipelines in other open source runtimes, including a Beam runner for
> > > Apache
> > > > Flink and Apache Spark. There is also a “direct runner”, for
> execution
> > on
> > > > the developer machine (mainly for dev/debug purposes). Another runner
> > > allows
> > > > a Beam program to run on a managed service, Google Cloud Dataflow, in
> > > Google
> > > > Cloud Platform. The Dataflow Java SDK is already available on GitHub,
> > and
> > > > independent from the Google Cloud Dataflow service. Another Python
> SDK
> > is
> > > > currently in active development.
> > > >
> > > > In this proposal, the Beam SDKs, model, and a set of runners will be
> > > > submitted as an OSS project under the ASF. The runners which are a
> part
> > > of
> > > > this proposal include those for Spark (from Cloudera), Flink (from
> data
> > > > Artisans), and local development (from Google); the Google Cloud
> > Dataflow
> > > > service runner is not included in this proposal. Further references
> to
> > > Beam
> > > > will refer to the Dataf

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-29 Thread Luke Han
+1 (non-binding)

On Fri, Jan 29, 2016 at 6:31 PM, Tom Barber  wrote:

> +1 (binding)
>
> On Fri, Jan 29, 2016 at 10:03 AM, Tom White  wrote:
>
> > +1 (binding)
> >
> > Tom
> >
> > On Thu, Jan 28, 2016 at 2:28 PM, Jean-Baptiste Onofré 
> > wrote:
> > > Hi,
> > >
> > > the Beam proposal (initially Dataflow) was proposed last week.
> > >
> > > The complete discussion thread is available here:
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> > >
> > > As reminder the BeamProposal is here:
> > >
> > > https://wiki.apache.org/incubator/BeamProposal
> > >
> > > Regarding all the great feedbacks we received on the mailing list, we
> > think
> > > it's time to call a vote to accept Beam into the Incubator.
> > >
> > > Please cast your vote to:
> > > [] +1 - accept Apache Beam as a new incubating project
> > > []  0 - not sure
> > > [] -1 - do not accept the Apache Beam project (because: ...)
> > >
> > > Thanks,
> > > Regards
> > > JB
> > > 
> > > ## page was renamed from DataflowProposal
> > > = Apache Beam =
> > >
> > > == Abstract ==
> > >
> > > Apache Beam is an open source, unified model and set of
> language-specific
> > > SDKs for defining and executing data processing workflows, and also
> data
> > > ingestion and integration flows, supporting Enterprise Integration
> > Patterns
> > > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines
> simplify
> > the
> > > mechanics of large-scale batch and streaming data processing and can
> run
> > on
> > > a number of runtimes like Apache Flink, Apache Spark, and Google Cloud
> > > Dataflow (a cloud service). Beam also brings DSL in different
> languages,
> > > allowing users to easily implement their data integration processes.
> > >
> > > == Proposal ==
> > >
> > > Beam is a simple, flexible, and powerful system for distributed data
> > > processing at any scale. Beam provides a unified programming model, a
> > > software development kit to define and construct data processing
> > pipelines,
> > > and runners to execute Beam pipelines in several runtime engines, like
> > > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used
> > for a
> > > variety of streaming or batch data processing goals including ETL,
> stream
> > > analysis, and aggregate computation. The underlying programming model
> for
> > > Beam provides MapReduce-like parallelism, combined with support for
> > powerful
> > > data windowing, and fine-grained correctness control.
> > >
> > > == Background ==
> > >
> > > Beam started as a set of Google projects (Google Cloud Dataflow)
> focused
> > on
> > > making data processing easier, faster, and less costly. The Beam model
> > is a
> > > successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> > > focused on providing a unified solution for batch and stream
> processing.
> > > These projects on which Beam is based have been published in several
> > papers
> > > made available to the public:
> > >
> > >  * MapReduce - http://research.google.com/archive/mapreduce.html
> > >  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> > >  * FlumeJava - http://research.google.com/pubs/pub35650.html
> > >  * MillWheel - http://research.google.com/pubs/pub41378.html
> > >
> > > Beam was designed from the start to provide a portable programming
> layer.
> > > When you define a data processing pipeline with the Beam model, you are
> > > creating a job which is capable of being processed by any number of
> Beam
> > > processing engines. Several engines have been developed to run Beam
> > > pipelines in other open source runtimes, including a Beam runner for
> > Apache
> > > Flink and Apache Spark. There is also a “direct runner”, for execution
> on
> > > the developer machine (mainly for dev/debug purposes). Another runner
> > allows
> > > a Beam program to run on a managed service, Google Cloud Dataflow, in
> > Google
> > > Cloud Platform. The Dataflow Java SDK is already available on GitHub,
> and
> > > independent from the Google Cloud Dataflow service. Another Python SDK
> is
> > > currently in active development.
> > >
> > > In this proposal, the Beam SDKs, model, and a set of runners will be
> > > submitted as an OSS project under the ASF. The runners which are a part
> > of
> > > this proposal include those for Spark (from Cloudera), Flink (from data
> > > Artisans), and local development (from Google); the Google Cloud
> Dataflow
> > > service runner is not included in this proposal. Further references to
> > Beam
> > > will refer to the Dataflow model, SDKs, and runners which are a part of
> > this
> > > proposal (Apache Beam) only. The initial submission will contain the
> > > already-released Java SDK; Google intends to submit the Python SDK
> later
> > in
> > > the incubation process. The Google Cloud Dataflow service will continue
> > to
> > > be one of many runners for 

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-29 Thread Tom Barber
+1 (binding)

On Fri, Jan 29, 2016 at 10:03 AM, Tom White  wrote:

> +1 (binding)
>
> Tom
>
> On Thu, Jan 28, 2016 at 2:28 PM, Jean-Baptiste Onofré 
> wrote:
> > Hi,
> >
> > the Beam proposal (initially Dataflow) was proposed last week.
> >
> > The complete discussion thread is available here:
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> >
> > As reminder the BeamProposal is here:
> >
> > https://wiki.apache.org/incubator/BeamProposal
> >
> > Regarding all the great feedbacks we received on the mailing list, we
> think
> > it's time to call a vote to accept Beam into the Incubator.
> >
> > Please cast your vote to:
> > [] +1 - accept Apache Beam as a new incubating project
> > []  0 - not sure
> > [] -1 - do not accept the Apache Beam project (because: ...)
> >
> > Thanks,
> > Regards
> > JB
> > 
> > ## page was renamed from DataflowProposal
> > = Apache Beam =
> >
> > == Abstract ==
> >
> > Apache Beam is an open source, unified model and set of language-specific
> > SDKs for defining and executing data processing workflows, and also data
> > ingestion and integration flows, supporting Enterprise Integration
> Patterns
> > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> the
> > mechanics of large-scale batch and streaming data processing and can run
> on
> > a number of runtimes like Apache Flink, Apache Spark, and Google Cloud
> > Dataflow (a cloud service). Beam also brings DSL in different languages,
> > allowing users to easily implement their data integration processes.
> >
> > == Proposal ==
> >
> > Beam is a simple, flexible, and powerful system for distributed data
> > processing at any scale. Beam provides a unified programming model, a
> > software development kit to define and construct data processing
> pipelines,
> > and runners to execute Beam pipelines in several runtime engines, like
> > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used
> for a
> > variety of streaming or batch data processing goals including ETL, stream
> > analysis, and aggregate computation. The underlying programming model for
> > Beam provides MapReduce-like parallelism, combined with support for
> powerful
> > data windowing, and fine-grained correctness control.
> >
> > == Background ==
> >
> > Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on
> > making data processing easier, faster, and less costly. The Beam model
> is a
> > successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> > focused on providing a unified solution for batch and stream processing.
> > These projects on which Beam is based have been published in several
> papers
> > made available to the public:
> >
> >  * MapReduce - http://research.google.com/archive/mapreduce.html
> >  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> >  * FlumeJava - http://research.google.com/pubs/pub35650.html
> >  * MillWheel - http://research.google.com/pubs/pub41378.html
> >
> > Beam was designed from the start to provide a portable programming layer.
> > When you define a data processing pipeline with the Beam model, you are
> > creating a job which is capable of being processed by any number of Beam
> > processing engines. Several engines have been developed to run Beam
> > pipelines in other open source runtimes, including a Beam runner for
> Apache
> > Flink and Apache Spark. There is also a “direct runner”, for execution on
> > the developer machine (mainly for dev/debug purposes). Another runner
> allows
> > a Beam program to run on a managed service, Google Cloud Dataflow, in
> Google
> > Cloud Platform. The Dataflow Java SDK is already available on GitHub, and
> > independent from the Google Cloud Dataflow service. Another Python SDK is
> > currently in active development.
> >
> > In this proposal, the Beam SDKs, model, and a set of runners will be
> > submitted as an OSS project under the ASF. The runners which are a part
> of
> > this proposal include those for Spark (from Cloudera), Flink (from data
> > Artisans), and local development (from Google); the Google Cloud Dataflow
> > service runner is not included in this proposal. Further references to
> Beam
> > will refer to the Dataflow model, SDKs, and runners which are a part of
> this
> > proposal (Apache Beam) only. The initial submission will contain the
> > already-released Java SDK; Google intends to submit the Python SDK later
> in
> > the incubation process. The Google Cloud Dataflow service will continue
> to
> > be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam
> > pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> > project additions, updates, and changes. Google Cloud Dataflow will
> become
> > one user of Apache Beam and will participate in the project openly and
> > publicly.
> >
> > The Beam programming model h

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-29 Thread Tom White
+1 (binding)

Tom

On Thu, Jan 28, 2016 at 2:28 PM, Jean-Baptiste Onofré  wrote:
> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we think
> it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the
> mechanics of large-scale batch and streaming data processing and can run on
> a number of runtimes like Apache Flink, Apache Spark, and Google Cloud
> Dataflow (a cloud service). Beam also brings DSL in different languages,
> allowing users to easily implement their data integration processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a
> variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for powerful
> data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused on
> making data processing easier, faster, and less costly. The Beam model is a
> successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner allows
> a Beam program to run on a managed service, Google Cloud Dataflow, in Google
> Cloud Platform. The Dataflow Java SDK is already available on GitHub, and
> independent from the Google Cloud Dataflow service. Another Python SDK is
> currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of this
> proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run Beam
> pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series of computations
> including input, processing, and

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-29 Thread Olivier Lamy
+1

Olivier

On 29 January 2016 at 01:28, Jean-Baptiste Onofré  wrote:

> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> the mechanics of large-scale batch and streaming data processing and can
> run on a number of runtimes like Apache Flink, Apache Spark, and Google
> Cloud Dataflow (a cloud service). Beam also brings DSL in different
> languages, allowing users to easily implement their data integration
> processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
> a variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for
> powerful data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam model
> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner
> allows a Beam program to run on a managed service, Google Cloud Dataflow,
> in Google Cloud Platform. The Dataflow Java SDK is already available on
> GitHub, and independent from the Google Cloud Dataflow service. Another
> Python SDK is currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of
> this proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series of computations
> including input, processing, and outp

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-29 Thread Renaud Richardet
+1 (non-binding)

On Fri, Jan 29, 2016 at 5:47 AM, Amareshwari Sriramdasu <
amareshw...@apache.org> wrote:

> +1 (Binding)
>
> On Thu, Jan 28, 2016 at 7:58 PM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi,
> >
> > the Beam proposal (initially Dataflow) was proposed last week.
> >
> > The complete discussion thread is available here:
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> >
> > As reminder the BeamProposal is here:
> >
> > https://wiki.apache.org/incubator/BeamProposal
> >
> > Regarding all the great feedbacks we received on the mailing list, we
> > think it's time to call a vote to accept Beam into the Incubator.
> >
> > Please cast your vote to:
> > [] +1 - accept Apache Beam as a new incubating project
> > []  0 - not sure
> > [] -1 - do not accept the Apache Beam project (because: ...)
> >
> > Thanks,
> > Regards
> > JB
> > 
> > ## page was renamed from DataflowProposal
> > = Apache Beam =
> >
> > == Abstract ==
> >
> > Apache Beam is an open source, unified model and set of language-specific
> > SDKs for defining and executing data processing workflows, and also data
> > ingestion and integration flows, supporting Enterprise Integration
> Patterns
> > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> > the mechanics of large-scale batch and streaming data processing and can
> > run on a number of runtimes like Apache Flink, Apache Spark, and Google
> > Cloud Dataflow (a cloud service). Beam also brings DSL in different
> > languages, allowing users to easily implement their data integration
> > processes.
> >
> > == Proposal ==
> >
> > Beam is a simple, flexible, and powerful system for distributed data
> > processing at any scale. Beam provides a unified programming model, a
> > software development kit to define and construct data processing
> pipelines,
> > and runners to execute Beam pipelines in several runtime engines, like
> > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used
> for
> > a variety of streaming or batch data processing goals including ETL,
> stream
> > analysis, and aggregate computation. The underlying programming model for
> > Beam provides MapReduce-like parallelism, combined with support for
> > powerful data windowing, and fine-grained correctness control.
> >
> > == Background ==
> >
> > Beam started as a set of Google projects (Google Cloud Dataflow) focused
> > on making data processing easier, faster, and less costly. The Beam model
> > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and
> is
> > focused on providing a unified solution for batch and stream processing.
> > These projects on which Beam is based have been published in several
> papers
> > made available to the public:
> >
> >  * MapReduce - http://research.google.com/archive/mapreduce.html
> >  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> >  * FlumeJava - http://research.google.com/pubs/pub35650.html
> >  * MillWheel - http://research.google.com/pubs/pub41378.html
> >
> > Beam was designed from the start to provide a portable programming layer.
> > When you define a data processing pipeline with the Beam model, you are
> > creating a job which is capable of being processed by any number of Beam
> > processing engines. Several engines have been developed to run Beam
> > pipelines in other open source runtimes, including a Beam runner for
> Apache
> > Flink and Apache Spark. There is also a “direct runner”, for execution on
> > the developer machine (mainly for dev/debug purposes). Another runner
> > allows a Beam program to run on a managed service, Google Cloud Dataflow,
> > in Google Cloud Platform. The Dataflow Java SDK is already available on
> > GitHub, and independent from the Google Cloud Dataflow service. Another
> > Python SDK is currently in active development.
> >
> > In this proposal, the Beam SDKs, model, and a set of runners will be
> > submitted as an OSS project under the ASF. The runners which are a part
> of
> > this proposal include those for Spark (from Cloudera), Flink (from data
> > Artisans), and local development (from Google); the Google Cloud Dataflow
> > service runner is not included in this proposal. Further references to
> Beam
> > will refer to the Dataflow model, SDKs, and runners which are a part of
> > this proposal (Apache Beam) only. The initial submission will contain the
> > already-released Java SDK; Google intends to submit the Python SDK later
> in
> > the incubation process. The Google Cloud Dataflow service will continue
> to
> > be one of many runners for Beam, built on Google Cloud Platform, to run
> > Beam pipelines. Necessarily, Cloud Dataflow will develop against the
> Apache
> > project additions, updates, and changes. Google Cloud Dataflow will
> become
> > one user of Apache Beam and will participate in the project openly and
> > publicly.
> >
> >

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Amareshwari Sriramdasu
+1 (Binding)

On Thu, Jan 28, 2016 at 7:58 PM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> the mechanics of large-scale batch and streaming data processing and can
> run on a number of runtimes like Apache Flink, Apache Spark, and Google
> Cloud Dataflow (a cloud service). Beam also brings DSL in different
> languages, allowing users to easily implement their data integration
> processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
> a variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for
> powerful data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam model
> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner
> allows a Beam program to run on a managed service, Google Cloud Dataflow,
> in Google Cloud Platform. The Dataflow Java SDK is already available on
> GitHub, and independent from the Google Cloud Dataflow service. Another
> Python SDK is currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of
> this proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series of computations
> including input, processing, and

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Li Yang
+1 (non-binding)

On Friday, January 29, 2016, Adunuthula, Seshu  wrote:

> +1 (non-binding)
>
> On 1/28/16, 12:05 PM, "Julian Hyde" >
> wrote:
>
> >+1 (binding)
> >
> >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal  > wrote:
> >>
> >> +1 (non-binding)
> >>
> >> Thanks,
> >> Mayank
> >>
> >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh <
> >> venkat...@innerzeal.com > wrote:
> >>
> >>> +1 (binding).
> >>>
> >>> Thanks!
> >>>
> >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning  >
> >>> wrote:
> >>>
>  +1
> 
> 
> 
>  On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament
> >
>  wrote:
> 
> > +1
> >
> > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré
> >>
> > wrote:
> >
> >> Hi,
> >>
> >> the Beam proposal (initially Dataflow) was proposed last week.
> >>
> >> The complete discussion thread is available here:
> >>
> >>
> >>
> >
> 
> >>>
> >>>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%
> >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c
> >>>om%3E
> >>
> >> As reminder the BeamProposal is here:
> >>
> >> https://wiki.apache.org/incubator/BeamProposal
> >>
> >> Regarding all the great feedbacks we received on the mailing list,
> >>we
> >> think it's time to call a vote to accept Beam into the Incubator.
> >>
> >> Please cast your vote to:
> >> [] +1 - accept Apache Beam as a new incubating project
> >> []  0 - not sure
> >> [] -1 - do not accept the Apache Beam project (because: ...)
> >>
> >> Thanks,
> >> Regards
> >> JB
> >> 
> >> ## page was renamed from DataflowProposal
> >> = Apache Beam =
> >>
> >> == Abstract ==
> >>
> >> Apache Beam is an open source, unified model and set of
> >> language-specific SDKs for defining and executing data processing
> >> workflows, and also data ingestion and integration flows, supporting
> >> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale
> >>> batch
> >> and streaming data processing and can run on a number of runtimes
> >>> like
> >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
>  service).
> >> Beam also brings DSL in different languages, allowing users to
> >>easily
> >> implement their data integration processes.
> >>
> >> == Proposal ==
> >>
> >> Beam is a simple, flexible, and powerful system for distributed data
> >> processing at any scale. Beam provides a unified programming model,
> >>a
> >> software development kit to define and construct data processing
> >> pipelines, and runners to execute Beam pipelines in several runtime
> >> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow.
>  Beam
> >> can be used for a variety of streaming or batch data processing
> >>goals
> >> including ETL, stream analysis, and aggregate computation. The
> >> underlying programming model for Beam provides MapReduce-like
> >> parallelism, combined with support for powerful data windowing, and
> >> fine-grained correctness control.
> >>
> >> == Background ==
> >>
> >> Beam started as a set of Google projects (Google Cloud Dataflow)
>  focused
> >> on making data processing easier, faster, and less costly. The Beam
> >> model is a successor to MapReduce, FlumeJava, and Millwheel inside
> >> Google and is focused on providing a unified solution for batch and
> >> stream processing. These projects on which Beam is based have been
> >> published in several papers made available to the public:
> >>
> >>  * MapReduce - http://research.google.com/archive/mapreduce.html
> >>  * Dataflow model  -
> >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> >>  * FlumeJava - http://research.google.com/pubs/pub35650.html
> >>  * MillWheel - http://research.google.com/pubs/pub41378.html
> >>
> >> Beam was designed from the start to provide a portable programming
> >> layer. When you define a data processing pipeline with the Beam
> >>> model,
> >> you are creating a job which is capable of being processed by any
>  number
> >> of Beam processing engines. Several engines have been developed to
> >>> run
> >> Beam pipelines in other open source runtimes, including a Beam
> >>runner
> >> for Apache Flink and Apache Spark. There is also a ³direct runner²,
> >>> for
> >> execution on the developer machine (mainly for dev/debug purposes).
> >> Another runner allows a Beam program to run on a managed service,
>  Google
> >> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
> >> already available on GitHub, and independent from the Google Cloud
> >> Dataflow service. Another Python SDK is curre

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Hao Chen
+1 (non-binding)

Regards,
Hao

On Fri, Jan 29, 2016 at 11:32 AM, Johan Edstrom  wrote:

> +1
>
> > On Jan 28, 2016, at 6:34 PM, Naresh Agarwal 
> wrote:
> >
> > +1  (non-binding)
> >
> > Thanks
> > Naresh
> > On 29 Jan 2016 06:18, "Hadrian Zbarcea"  wrote:
> >
> >> +1 (binding)
> >>
> >> Man, congrats on a job fantastically well done. This is ASF incubator
> >> participation at its best.
> >>
> >> Expectations are high now. I am looking forward to exemplary governance
> >> and speedy graduation.
> >>
> >> Best of luck,
> >> Hadrian
> >>
> >> On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote:
> >>
> >>> Hi,
> >>>
> >>> the Beam proposal (initially Dataflow) was proposed last week.
> >>>
> >>> The complete discussion thread is available here:
> >>>
> >>>
> >>>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> >>>
> >>>
> >>> As reminder the BeamProposal is here:
> >>>
> >>> https://wiki.apache.org/incubator/BeamProposal
> >>>
> >>> Regarding all the great feedbacks we received on the mailing list, we
> >>> think it's time to call a vote to accept Beam into the Incubator.
> >>>
> >>> Please cast your vote to:
> >>> [] +1 - accept Apache Beam as a new incubating project
> >>> []  0 - not sure
> >>> [] -1 - do not accept the Apache Beam project (because: ...)
> >>>
> >>> Thanks,
> >>> Regards
> >>> JB
> >>> 
> >>> ## page was renamed from DataflowProposal
> >>> = Apache Beam =
> >>>
> >>> == Abstract ==
> >>>
> >>> Apache Beam is an open source, unified model and set of
> >>> language-specific SDKs for defining and executing data processing
> >>> workflows, and also data ingestion and integration flows, supporting
> >>> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> >>> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
> >>> and streaming data processing and can run on a number of runtimes like
> >>> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
> service).
> >>> Beam also brings DSL in different languages, allowing users to easily
> >>> implement their data integration processes.
> >>>
> >>> == Proposal ==
> >>>
> >>> Beam is a simple, flexible, and powerful system for distributed data
> >>> processing at any scale. Beam provides a unified programming model, a
> >>> software development kit to define and construct data processing
> >>> pipelines, and runners to execute Beam pipelines in several runtime
> >>> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow.
> Beam
> >>> can be used for a variety of streaming or batch data processing goals
> >>> including ETL, stream analysis, and aggregate computation. The
> >>> underlying programming model for Beam provides MapReduce-like
> >>> parallelism, combined with support for powerful data windowing, and
> >>> fine-grained correctness control.
> >>>
> >>> == Background ==
> >>>
> >>> Beam started as a set of Google projects (Google Cloud Dataflow)
> focused
> >>> on making data processing easier, faster, and less costly. The Beam
> >>> model is a successor to MapReduce, FlumeJava, and Millwheel inside
> >>> Google and is focused on providing a unified solution for batch and
> >>> stream processing. These projects on which Beam is based have been
> >>> published in several papers made available to the public:
> >>>
> >>>  * MapReduce - http://research.google.com/archive/mapreduce.html
> >>>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> >>>  * FlumeJava - http://research.google.com/pubs/pub35650.html
> >>>  * MillWheel - http://research.google.com/pubs/pub41378.html
> >>>
> >>> Beam was designed from the start to provide a portable programming
> >>> layer. When you define a data processing pipeline with the Beam model,
> >>> you are creating a job which is capable of being processed by any
> number
> >>> of Beam processing engines. Several engines have been developed to run
> >>> Beam pipelines in other open source runtimes, including a Beam runner
> >>> for Apache Flink and Apache Spark. There is also a “direct runner”, for
> >>> execution on the developer machine (mainly for dev/debug purposes).
> >>> Another runner allows a Beam program to run on a managed service,
> Google
> >>> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
> >>> already available on GitHub, and independent from the Google Cloud
> >>> Dataflow service. Another Python SDK is currently in active
> development.
> >>>
> >>> In this proposal, the Beam SDKs, model, and a set of runners will be
> >>> submitted as an OSS project under the ASF. The runners which are a part
> >>> of this proposal include those for Spark (from Cloudera), Flink (from
> >>> data Artisans), and local development (from Google); the Google Cloud
> >>> Dataflow service runner is not included in this proposal. Further
> >>> references to Beam will refer to the Dataflow model, SDKs, and runners
> >>> whi

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Tsuyoshi Ozawa
+1(non-binding)

- Tsuyoshi

On Fri, Jan 29, 2016 at 12:32 PM, Johan Edstrom  wrote:
> +1
>
>> On Jan 28, 2016, at 6:34 PM, Naresh Agarwal  wrote:
>>
>> +1  (non-binding)
>>
>> Thanks
>> Naresh
>> On 29 Jan 2016 06:18, "Hadrian Zbarcea"  wrote:
>>
>>> +1 (binding)
>>>
>>> Man, congrats on a job fantastically well done. This is ASF incubator
>>> participation at its best.
>>>
>>> Expectations are high now. I am looking forward to exemplary governance
>>> and speedy graduation.
>>>
>>> Best of luck,
>>> Hadrian
>>>
>>> On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote:
>>>
 Hi,

 the Beam proposal (initially Dataflow) was proposed last week.

 The complete discussion thread is available here:


 http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E


 As reminder the BeamProposal is here:

 https://wiki.apache.org/incubator/BeamProposal

 Regarding all the great feedbacks we received on the mailing list, we
 think it's time to call a vote to accept Beam into the Incubator.

 Please cast your vote to:
 [] +1 - accept Apache Beam as a new incubating project
 []  0 - not sure
 [] -1 - do not accept the Apache Beam project (because: ...)

 Thanks,
 Regards
 JB
 
 ## page was renamed from DataflowProposal
 = Apache Beam =

 == Abstract ==

 Apache Beam is an open source, unified model and set of
 language-specific SDKs for defining and executing data processing
 workflows, and also data ingestion and integration flows, supporting
 Enterprise Integration Patterns (EIPs) and Domain Specific Languages
 (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
 and streaming data processing and can run on a number of runtimes like
 Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
 Beam also brings DSL in different languages, allowing users to easily
 implement their data integration processes.

 == Proposal ==

 Beam is a simple, flexible, and powerful system for distributed data
 processing at any scale. Beam provides a unified programming model, a
 software development kit to define and construct data processing
 pipelines, and runners to execute Beam pipelines in several runtime
 engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam
 can be used for a variety of streaming or batch data processing goals
 including ETL, stream analysis, and aggregate computation. The
 underlying programming model for Beam provides MapReduce-like
 parallelism, combined with support for powerful data windowing, and
 fine-grained correctness control.

 == Background ==

 Beam started as a set of Google projects (Google Cloud Dataflow) focused
 on making data processing easier, faster, and less costly. The Beam
 model is a successor to MapReduce, FlumeJava, and Millwheel inside
 Google and is focused on providing a unified solution for batch and
 stream processing. These projects on which Beam is based have been
 published in several papers made available to the public:

  * MapReduce - http://research.google.com/archive/mapreduce.html
  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
  * FlumeJava - http://research.google.com/pubs/pub35650.html
  * MillWheel - http://research.google.com/pubs/pub41378.html

 Beam was designed from the start to provide a portable programming
 layer. When you define a data processing pipeline with the Beam model,
 you are creating a job which is capable of being processed by any number
 of Beam processing engines. Several engines have been developed to run
 Beam pipelines in other open source runtimes, including a Beam runner
 for Apache Flink and Apache Spark. There is also a “direct runner”, for
 execution on the developer machine (mainly for dev/debug purposes).
 Another runner allows a Beam program to run on a managed service, Google
 Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
 already available on GitHub, and independent from the Google Cloud
 Dataflow service. Another Python SDK is currently in active development.

 In this proposal, the Beam SDKs, model, and a set of runners will be
 submitted as an OSS project under the ASF. The runners which are a part
 of this proposal include those for Spark (from Cloudera), Flink (from
 data Artisans), and local development (from Google); the Google Cloud
 Dataflow service runner is not included in this proposal. Further
 references to Beam will refer to the Dataflow model, SDKs, and runners
 which are a part of this proposal (Apache Beam) only. The initial
 submission will contain the already-released Java SDK; Googl

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Johan Edstrom
+1

> On Jan 28, 2016, at 6:34 PM, Naresh Agarwal  wrote:
> 
> +1  (non-binding)
> 
> Thanks
> Naresh
> On 29 Jan 2016 06:18, "Hadrian Zbarcea"  wrote:
> 
>> +1 (binding)
>> 
>> Man, congrats on a job fantastically well done. This is ASF incubator
>> participation at its best.
>> 
>> Expectations are high now. I am looking forward to exemplary governance
>> and speedy graduation.
>> 
>> Best of luck,
>> Hadrian
>> 
>> On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote:
>> 
>>> Hi,
>>> 
>>> the Beam proposal (initially Dataflow) was proposed last week.
>>> 
>>> The complete discussion thread is available here:
>>> 
>>> 
>>> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>>> 
>>> 
>>> As reminder the BeamProposal is here:
>>> 
>>> https://wiki.apache.org/incubator/BeamProposal
>>> 
>>> Regarding all the great feedbacks we received on the mailing list, we
>>> think it's time to call a vote to accept Beam into the Incubator.
>>> 
>>> Please cast your vote to:
>>> [] +1 - accept Apache Beam as a new incubating project
>>> []  0 - not sure
>>> [] -1 - do not accept the Apache Beam project (because: ...)
>>> 
>>> Thanks,
>>> Regards
>>> JB
>>> 
>>> ## page was renamed from DataflowProposal
>>> = Apache Beam =
>>> 
>>> == Abstract ==
>>> 
>>> Apache Beam is an open source, unified model and set of
>>> language-specific SDKs for defining and executing data processing
>>> workflows, and also data ingestion and integration flows, supporting
>>> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
>>> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
>>> and streaming data processing and can run on a number of runtimes like
>>> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
>>> Beam also brings DSL in different languages, allowing users to easily
>>> implement their data integration processes.
>>> 
>>> == Proposal ==
>>> 
>>> Beam is a simple, flexible, and powerful system for distributed data
>>> processing at any scale. Beam provides a unified programming model, a
>>> software development kit to define and construct data processing
>>> pipelines, and runners to execute Beam pipelines in several runtime
>>> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam
>>> can be used for a variety of streaming or batch data processing goals
>>> including ETL, stream analysis, and aggregate computation. The
>>> underlying programming model for Beam provides MapReduce-like
>>> parallelism, combined with support for powerful data windowing, and
>>> fine-grained correctness control.
>>> 
>>> == Background ==
>>> 
>>> Beam started as a set of Google projects (Google Cloud Dataflow) focused
>>> on making data processing easier, faster, and less costly. The Beam
>>> model is a successor to MapReduce, FlumeJava, and Millwheel inside
>>> Google and is focused on providing a unified solution for batch and
>>> stream processing. These projects on which Beam is based have been
>>> published in several papers made available to the public:
>>> 
>>>  * MapReduce - http://research.google.com/archive/mapreduce.html
>>>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>>>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>>>  * MillWheel - http://research.google.com/pubs/pub41378.html
>>> 
>>> Beam was designed from the start to provide a portable programming
>>> layer. When you define a data processing pipeline with the Beam model,
>>> you are creating a job which is capable of being processed by any number
>>> of Beam processing engines. Several engines have been developed to run
>>> Beam pipelines in other open source runtimes, including a Beam runner
>>> for Apache Flink and Apache Spark. There is also a “direct runner”, for
>>> execution on the developer machine (mainly for dev/debug purposes).
>>> Another runner allows a Beam program to run on a managed service, Google
>>> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
>>> already available on GitHub, and independent from the Google Cloud
>>> Dataflow service. Another Python SDK is currently in active development.
>>> 
>>> In this proposal, the Beam SDKs, model, and a set of runners will be
>>> submitted as an OSS project under the ASF. The runners which are a part
>>> of this proposal include those for Spark (from Cloudera), Flink (from
>>> data Artisans), and local development (from Google); the Google Cloud
>>> Dataflow service runner is not included in this proposal. Further
>>> references to Beam will refer to the Dataflow model, SDKs, and runners
>>> which are a part of this proposal (Apache Beam) only. The initial
>>> submission will contain the already-released Java SDK; Google intends to
>>> submit the Python SDK later in the incubation process. The Google Cloud
>>> Dataflow service will continue to be one of many runners for Beam, built
>>> 

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Naresh Agarwal
+1  (non-binding)

Thanks
Naresh
On 29 Jan 2016 06:18, "Hadrian Zbarcea"  wrote:

> +1 (binding)
>
> Man, congrats on a job fantastically well done. This is ASF incubator
> participation at its best.
>
> Expectations are high now. I am looking forward to exemplary governance
> and speedy graduation.
>
> Best of luck,
> Hadrian
>
> On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote:
>
>> Hi,
>>
>> the Beam proposal (initially Dataflow) was proposed last week.
>>
>> The complete discussion thread is available here:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>>
>>
>> As reminder the BeamProposal is here:
>>
>> https://wiki.apache.org/incubator/BeamProposal
>>
>> Regarding all the great feedbacks we received on the mailing list, we
>> think it's time to call a vote to accept Beam into the Incubator.
>>
>> Please cast your vote to:
>> [] +1 - accept Apache Beam as a new incubating project
>> []  0 - not sure
>> [] -1 - do not accept the Apache Beam project (because: ...)
>>
>> Thanks,
>> Regards
>> JB
>> 
>> ## page was renamed from DataflowProposal
>> = Apache Beam =
>>
>> == Abstract ==
>>
>> Apache Beam is an open source, unified model and set of
>> language-specific SDKs for defining and executing data processing
>> workflows, and also data ingestion and integration flows, supporting
>> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
>> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
>> and streaming data processing and can run on a number of runtimes like
>> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
>> Beam also brings DSL in different languages, allowing users to easily
>> implement their data integration processes.
>>
>> == Proposal ==
>>
>> Beam is a simple, flexible, and powerful system for distributed data
>> processing at any scale. Beam provides a unified programming model, a
>> software development kit to define and construct data processing
>> pipelines, and runners to execute Beam pipelines in several runtime
>> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam
>> can be used for a variety of streaming or batch data processing goals
>> including ETL, stream analysis, and aggregate computation. The
>> underlying programming model for Beam provides MapReduce-like
>> parallelism, combined with support for powerful data windowing, and
>> fine-grained correctness control.
>>
>> == Background ==
>>
>> Beam started as a set of Google projects (Google Cloud Dataflow) focused
>> on making data processing easier, faster, and less costly. The Beam
>> model is a successor to MapReduce, FlumeJava, and Millwheel inside
>> Google and is focused on providing a unified solution for batch and
>> stream processing. These projects on which Beam is based have been
>> published in several papers made available to the public:
>>
>>   * MapReduce - http://research.google.com/archive/mapreduce.html
>>   * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>>   * FlumeJava - http://research.google.com/pubs/pub35650.html
>>   * MillWheel - http://research.google.com/pubs/pub41378.html
>>
>> Beam was designed from the start to provide a portable programming
>> layer. When you define a data processing pipeline with the Beam model,
>> you are creating a job which is capable of being processed by any number
>> of Beam processing engines. Several engines have been developed to run
>> Beam pipelines in other open source runtimes, including a Beam runner
>> for Apache Flink and Apache Spark. There is also a “direct runner”, for
>> execution on the developer machine (mainly for dev/debug purposes).
>> Another runner allows a Beam program to run on a managed service, Google
>> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
>> already available on GitHub, and independent from the Google Cloud
>> Dataflow service. Another Python SDK is currently in active development.
>>
>> In this proposal, the Beam SDKs, model, and a set of runners will be
>> submitted as an OSS project under the ASF. The runners which are a part
>> of this proposal include those for Spark (from Cloudera), Flink (from
>> data Artisans), and local development (from Google); the Google Cloud
>> Dataflow service runner is not included in this proposal. Further
>> references to Beam will refer to the Dataflow model, SDKs, and runners
>> which are a part of this proposal (Apache Beam) only. The initial
>> submission will contain the already-released Java SDK; Google intends to
>> submit the Python SDK later in the incubation process. The Google Cloud
>> Dataflow service will continue to be one of many runners for Beam, built
>> on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud
>> Dataflow will develop against the Apache project additions, updates, and
>> changes. Google Cloud Dataflow will become one use

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Hadrian Zbarcea

+1 (binding)

Man, congrats on a job fantastically well done. This is ASF incubator 
participation at its best.


Expectations are high now. I am looking forward to exemplary governance 
and speedy graduation.


Best of luck,
Hadrian

On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote:

Hi,

the Beam proposal (initially Dataflow) was proposed last week.

The complete discussion thread is available here:

http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E


As reminder the BeamProposal is here:

https://wiki.apache.org/incubator/BeamProposal

Regarding all the great feedbacks we received on the mailing list, we
think it's time to call a vote to accept Beam into the Incubator.

Please cast your vote to:
[] +1 - accept Apache Beam as a new incubating project
[]  0 - not sure
[] -1 - do not accept the Apache Beam project (because: ...)

Thanks,
Regards
JB

## page was renamed from DataflowProposal
= Apache Beam =

== Abstract ==

Apache Beam is an open source, unified model and set of
language-specific SDKs for defining and executing data processing
workflows, and also data ingestion and integration flows, supporting
Enterprise Integration Patterns (EIPs) and Domain Specific Languages
(DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
and streaming data processing and can run on a number of runtimes like
Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
Beam also brings DSL in different languages, allowing users to easily
implement their data integration processes.

== Proposal ==

Beam is a simple, flexible, and powerful system for distributed data
processing at any scale. Beam provides a unified programming model, a
software development kit to define and construct data processing
pipelines, and runners to execute Beam pipelines in several runtime
engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam
can be used for a variety of streaming or batch data processing goals
including ETL, stream analysis, and aggregate computation. The
underlying programming model for Beam provides MapReduce-like
parallelism, combined with support for powerful data windowing, and
fine-grained correctness control.

== Background ==

Beam started as a set of Google projects (Google Cloud Dataflow) focused
on making data processing easier, faster, and less costly. The Beam
model is a successor to MapReduce, FlumeJava, and Millwheel inside
Google and is focused on providing a unified solution for batch and
stream processing. These projects on which Beam is based have been
published in several papers made available to the public:

  * MapReduce - http://research.google.com/archive/mapreduce.html
  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
  * FlumeJava - http://research.google.com/pubs/pub35650.html
  * MillWheel - http://research.google.com/pubs/pub41378.html

Beam was designed from the start to provide a portable programming
layer. When you define a data processing pipeline with the Beam model,
you are creating a job which is capable of being processed by any number
of Beam processing engines. Several engines have been developed to run
Beam pipelines in other open source runtimes, including a Beam runner
for Apache Flink and Apache Spark. There is also a “direct runner”, for
execution on the developer machine (mainly for dev/debug purposes).
Another runner allows a Beam program to run on a managed service, Google
Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
already available on GitHub, and independent from the Google Cloud
Dataflow service. Another Python SDK is currently in active development.

In this proposal, the Beam SDKs, model, and a set of runners will be
submitted as an OSS project under the ASF. The runners which are a part
of this proposal include those for Spark (from Cloudera), Flink (from
data Artisans), and local development (from Google); the Google Cloud
Dataflow service runner is not included in this proposal. Further
references to Beam will refer to the Dataflow model, SDKs, and runners
which are a part of this proposal (Apache Beam) only. The initial
submission will contain the already-released Java SDK; Google intends to
submit the Python SDK later in the incubation process. The Google Cloud
Dataflow service will continue to be one of many runners for Beam, built
on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud
Dataflow will develop against the Apache project additions, updates, and
changes. Google Cloud Dataflow will become one user of Apache Beam and
will participate in the project openly and publicly.

The Beam programming model has been designed with simplicity,
scalability, and speed as key tenants. In the Beam model, you only need
to think about four top-level concepts when constructing your data
processing job:

  * Pipelines - The data processing job made of a series of comput

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Phillip Rhodes
On Jan 28, 2016 6:28 AM, "Jean-Baptiste Onofré"  wrote:
>
> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
>
http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we
think it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>

+1

Phil


Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Edward J. Yoon
+1 (binding).

On Fri, Jan 29, 2016 at 7:51 AM, Gregory Chase  wrote:
> + 1 (non-binding), and cool name!
>
> On Thu, Jan 28, 2016 at 2:47 PM, Byung-Gon Chun  wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> On Fri, Jan 29, 2016 at 5:31 AM, Adunuthula, Seshu 
>> wrote:
>>
>> > +1 (non-binding)
>> >
>> > On 1/28/16, 12:05 PM, "Julian Hyde"  wrote:
>> >
>> > >+1 (binding)
>> > >
>> > >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal 
>> wrote:
>> > >>
>> > >> +1 (non-binding)
>> > >>
>> > >> Thanks,
>> > >> Mayank
>> > >>
>> > >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh <
>> > >> venkat...@innerzeal.com> wrote:
>> > >>
>> > >>> +1 (binding).
>> > >>>
>> > >>> Thanks!
>> > >>>
>> > >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning 
>> > >>> wrote:
>> > >>>
>> >  +1
>> > 
>> > 
>> > 
>> >  On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament
>> > 
>> >  wrote:
>> > 
>> > > +1
>> > >
>> > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré
>> > >
>> > > wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> the Beam proposal (initially Dataflow) was proposed last week.
>> > >>
>> > >> The complete discussion thread is available here:
>> > >>
>> > >>
>> > >>
>> > >
>> > 
>> > >>>
>> > >>>
>> > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%
>> >
>> >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c
>> > >>>om%3E
>> > >>
>> > >> As reminder the BeamProposal is here:
>> > >>
>> > >> https://wiki.apache.org/incubator/BeamProposal
>> > >>
>> > >> Regarding all the great feedbacks we received on the mailing list,
>> > >>we
>> > >> think it's time to call a vote to accept Beam into the Incubator.
>> > >>
>> > >> Please cast your vote to:
>> > >> [] +1 - accept Apache Beam as a new incubating project
>> > >> []  0 - not sure
>> > >> [] -1 - do not accept the Apache Beam project (because: ...)
>> > >>
>> > >> Thanks,
>> > >> Regards
>> > >> JB
>> > >> 
>> > >> ## page was renamed from DataflowProposal
>> > >> = Apache Beam =
>> > >>
>> > >> == Abstract ==
>> > >>
>> > >> Apache Beam is an open source, unified model and set of
>> > >> language-specific SDKs for defining and executing data processing
>> > >> workflows, and also data ingestion and integration flows,
>> supporting
>> > >> Enterprise Integration Patterns (EIPs) and Domain Specific
>> Languages
>> > >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale
>> > >>> batch
>> > >> and streaming data processing and can run on a number of runtimes
>> > >>> like
>> > >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
>> >  service).
>> > >> Beam also brings DSL in different languages, allowing users to
>> > >>easily
>> > >> implement their data integration processes.
>> > >>
>> > >> == Proposal ==
>> > >>
>> > >> Beam is a simple, flexible, and powerful system for distributed
>> data
>> > >> processing at any scale. Beam provides a unified programming
>> model,
>> > >>a
>> > >> software development kit to define and construct data processing
>> > >> pipelines, and runners to execute Beam pipelines in several
>> runtime
>> > >> engines, like Apache Spark, Apache Flink, or Google Cloud
>> Dataflow.
>> >  Beam
>> > >> can be used for a variety of streaming or batch data processing
>> > >>goals
>> > >> including ETL, stream analysis, and aggregate computation. The
>> > >> underlying programming model for Beam provides MapReduce-like
>> > >> parallelism, combined with support for powerful data windowing,
>> and
>> > >> fine-grained correctness control.
>> > >>
>> > >> == Background ==
>> > >>
>> > >> Beam started as a set of Google projects (Google Cloud Dataflow)
>> >  focused
>> > >> on making data processing easier, faster, and less costly. The
>> Beam
>> > >> model is a successor to MapReduce, FlumeJava, and Millwheel inside
>> > >> Google and is focused on providing a unified solution for batch
>> and
>> > >> stream processing. These projects on which Beam is based have been
>> > >> published in several papers made available to the public:
>> > >>
>> > >>  * MapReduce - http://research.google.com/archive/mapreduce.html
>> > >>  * Dataflow model  -
>> > >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>> > >>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>> > >>  * MillWheel - http://research.google.com/pubs/pub41378.html
>> > >>
>> > >> Beam was designed from the start to provide a portable programming
>> > >> layer. When you define a data processing pipeline with the Beam
>> > >>> model,
>> > >> you are creating a job which is capable of being processed by any
>> >  number
>> > >> of Beam processing engines. 

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Greg Stein
+1 (binding)

On Thu, Jan 28, 2016 at 8:28 AM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> the mechanics of large-scale batch and streaming data processing and can
> run on a number of runtimes like Apache Flink, Apache Spark, and Google
> Cloud Dataflow (a cloud service). Beam also brings DSL in different
> languages, allowing users to easily implement their data integration
> processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
> a variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for
> powerful data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam model
> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner
> allows a Beam program to run on a managed service, Google Cloud Dataflow,
> in Google Cloud Platform. The Dataflow Java SDK is already available on
> GitHub, and independent from the Google Cloud Dataflow service. Another
> Python SDK is currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of
> this proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series of computations
> including input, processing, and

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Gregory Chase
+ 1 (non-binding), and cool name!

On Thu, Jan 28, 2016 at 2:47 PM, Byung-Gon Chun  wrote:

> +1 (non-binding)
>
>
>
> On Fri, Jan 29, 2016 at 5:31 AM, Adunuthula, Seshu 
> wrote:
>
> > +1 (non-binding)
> >
> > On 1/28/16, 12:05 PM, "Julian Hyde"  wrote:
> >
> > >+1 (binding)
> > >
> > >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal 
> wrote:
> > >>
> > >> +1 (non-binding)
> > >>
> > >> Thanks,
> > >> Mayank
> > >>
> > >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh <
> > >> venkat...@innerzeal.com> wrote:
> > >>
> > >>> +1 (binding).
> > >>>
> > >>> Thanks!
> > >>>
> > >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning 
> > >>> wrote:
> > >>>
> >  +1
> > 
> > 
> > 
> >  On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament
> > 
> >  wrote:
> > 
> > > +1
> > >
> > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré
> > >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> the Beam proposal (initially Dataflow) was proposed last week.
> > >>
> > >> The complete discussion thread is available here:
> > >>
> > >>
> > >>
> > >
> > 
> > >>>
> > >>>
> > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%
> >
> >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c
> > >>>om%3E
> > >>
> > >> As reminder the BeamProposal is here:
> > >>
> > >> https://wiki.apache.org/incubator/BeamProposal
> > >>
> > >> Regarding all the great feedbacks we received on the mailing list,
> > >>we
> > >> think it's time to call a vote to accept Beam into the Incubator.
> > >>
> > >> Please cast your vote to:
> > >> [] +1 - accept Apache Beam as a new incubating project
> > >> []  0 - not sure
> > >> [] -1 - do not accept the Apache Beam project (because: ...)
> > >>
> > >> Thanks,
> > >> Regards
> > >> JB
> > >> 
> > >> ## page was renamed from DataflowProposal
> > >> = Apache Beam =
> > >>
> > >> == Abstract ==
> > >>
> > >> Apache Beam is an open source, unified model and set of
> > >> language-specific SDKs for defining and executing data processing
> > >> workflows, and also data ingestion and integration flows,
> supporting
> > >> Enterprise Integration Patterns (EIPs) and Domain Specific
> Languages
> > >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale
> > >>> batch
> > >> and streaming data processing and can run on a number of runtimes
> > >>> like
> > >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
> >  service).
> > >> Beam also brings DSL in different languages, allowing users to
> > >>easily
> > >> implement their data integration processes.
> > >>
> > >> == Proposal ==
> > >>
> > >> Beam is a simple, flexible, and powerful system for distributed
> data
> > >> processing at any scale. Beam provides a unified programming
> model,
> > >>a
> > >> software development kit to define and construct data processing
> > >> pipelines, and runners to execute Beam pipelines in several
> runtime
> > >> engines, like Apache Spark, Apache Flink, or Google Cloud
> Dataflow.
> >  Beam
> > >> can be used for a variety of streaming or batch data processing
> > >>goals
> > >> including ETL, stream analysis, and aggregate computation. The
> > >> underlying programming model for Beam provides MapReduce-like
> > >> parallelism, combined with support for powerful data windowing,
> and
> > >> fine-grained correctness control.
> > >>
> > >> == Background ==
> > >>
> > >> Beam started as a set of Google projects (Google Cloud Dataflow)
> >  focused
> > >> on making data processing easier, faster, and less costly. The
> Beam
> > >> model is a successor to MapReduce, FlumeJava, and Millwheel inside
> > >> Google and is focused on providing a unified solution for batch
> and
> > >> stream processing. These projects on which Beam is based have been
> > >> published in several papers made available to the public:
> > >>
> > >>  * MapReduce - http://research.google.com/archive/mapreduce.html
> > >>  * Dataflow model  -
> > >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> > >>  * FlumeJava - http://research.google.com/pubs/pub35650.html
> > >>  * MillWheel - http://research.google.com/pubs/pub41378.html
> > >>
> > >> Beam was designed from the start to provide a portable programming
> > >> layer. When you define a data processing pipeline with the Beam
> > >>> model,
> > >> you are creating a job which is capable of being processed by any
> >  number
> > >> of Beam processing engines. Several engines have been developed to
> > >>> run
> > >> Beam pipelines in other open source runtimes, including a Beam
> > >>runner
> > >> for Apache Flink and Apache Spark. There is also a ³direct
> runner²

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Byung-Gon Chun
+1 (non-binding)



On Fri, Jan 29, 2016 at 5:31 AM, Adunuthula, Seshu 
wrote:

> +1 (non-binding)
>
> On 1/28/16, 12:05 PM, "Julian Hyde"  wrote:
>
> >+1 (binding)
> >
> >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal  wrote:
> >>
> >> +1 (non-binding)
> >>
> >> Thanks,
> >> Mayank
> >>
> >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh <
> >> venkat...@innerzeal.com> wrote:
> >>
> >>> +1 (binding).
> >>>
> >>> Thanks!
> >>>
> >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning 
> >>> wrote:
> >>>
>  +1
> 
> 
> 
>  On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament
> 
>  wrote:
> 
> > +1
> >
> > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré
> >
> > wrote:
> >
> >> Hi,
> >>
> >> the Beam proposal (initially Dataflow) was proposed last week.
> >>
> >> The complete discussion thread is available here:
> >>
> >>
> >>
> >
> 
> >>>
> >>>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%
> >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c
> >>>om%3E
> >>
> >> As reminder the BeamProposal is here:
> >>
> >> https://wiki.apache.org/incubator/BeamProposal
> >>
> >> Regarding all the great feedbacks we received on the mailing list,
> >>we
> >> think it's time to call a vote to accept Beam into the Incubator.
> >>
> >> Please cast your vote to:
> >> [] +1 - accept Apache Beam as a new incubating project
> >> []  0 - not sure
> >> [] -1 - do not accept the Apache Beam project (because: ...)
> >>
> >> Thanks,
> >> Regards
> >> JB
> >> 
> >> ## page was renamed from DataflowProposal
> >> = Apache Beam =
> >>
> >> == Abstract ==
> >>
> >> Apache Beam is an open source, unified model and set of
> >> language-specific SDKs for defining and executing data processing
> >> workflows, and also data ingestion and integration flows, supporting
> >> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale
> >>> batch
> >> and streaming data processing and can run on a number of runtimes
> >>> like
> >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
>  service).
> >> Beam also brings DSL in different languages, allowing users to
> >>easily
> >> implement their data integration processes.
> >>
> >> == Proposal ==
> >>
> >> Beam is a simple, flexible, and powerful system for distributed data
> >> processing at any scale. Beam provides a unified programming model,
> >>a
> >> software development kit to define and construct data processing
> >> pipelines, and runners to execute Beam pipelines in several runtime
> >> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow.
>  Beam
> >> can be used for a variety of streaming or batch data processing
> >>goals
> >> including ETL, stream analysis, and aggregate computation. The
> >> underlying programming model for Beam provides MapReduce-like
> >> parallelism, combined with support for powerful data windowing, and
> >> fine-grained correctness control.
> >>
> >> == Background ==
> >>
> >> Beam started as a set of Google projects (Google Cloud Dataflow)
>  focused
> >> on making data processing easier, faster, and less costly. The Beam
> >> model is a successor to MapReduce, FlumeJava, and Millwheel inside
> >> Google and is focused on providing a unified solution for batch and
> >> stream processing. These projects on which Beam is based have been
> >> published in several papers made available to the public:
> >>
> >>  * MapReduce - http://research.google.com/archive/mapreduce.html
> >>  * Dataflow model  -
> >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> >>  * FlumeJava - http://research.google.com/pubs/pub35650.html
> >>  * MillWheel - http://research.google.com/pubs/pub41378.html
> >>
> >> Beam was designed from the start to provide a portable programming
> >> layer. When you define a data processing pipeline with the Beam
> >>> model,
> >> you are creating a job which is capable of being processed by any
>  number
> >> of Beam processing engines. Several engines have been developed to
> >>> run
> >> Beam pipelines in other open source runtimes, including a Beam
> >>runner
> >> for Apache Flink and Apache Spark. There is also a ³direct runner²,
> >>> for
> >> execution on the developer machine (mainly for dev/debug purposes).
> >> Another runner allows a Beam program to run on a managed service,
>  Google
> >> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
> >> already available on GitHub, and independent from the Google Cloud
> >> Dataflow service. Another Python SDK is currently

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Adunuthula, Seshu
+1 (non-binding)

On 1/28/16, 12:05 PM, "Julian Hyde"  wrote:

>+1 (binding)
>
>> On Jan 28, 2016, at 10:42 AM, Mayank Bansal  wrote:
>> 
>> +1 (non-binding)
>> 
>> Thanks,
>> Mayank
>> 
>> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh <
>> venkat...@innerzeal.com> wrote:
>> 
>>> +1 (binding).
>>> 
>>> Thanks!
>>> 
>>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning 
>>> wrote:
>>> 
 +1
 
 
 
 On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament

 wrote:
 
> +1
> 
> On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré
>
> wrote:
> 
>> Hi,
>> 
>> the Beam proposal (initially Dataflow) was proposed last week.
>> 
>> The complete discussion thread is available here:
>> 
>> 
>> 
> 
 
>>> 
>>>http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%
>>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c
>>>om%3E
>> 
>> As reminder the BeamProposal is here:
>> 
>> https://wiki.apache.org/incubator/BeamProposal
>> 
>> Regarding all the great feedbacks we received on the mailing list,
>>we
>> think it's time to call a vote to accept Beam into the Incubator.
>> 
>> Please cast your vote to:
>> [] +1 - accept Apache Beam as a new incubating project
>> []  0 - not sure
>> [] -1 - do not accept the Apache Beam project (because: ...)
>> 
>> Thanks,
>> Regards
>> JB
>> 
>> ## page was renamed from DataflowProposal
>> = Apache Beam =
>> 
>> == Abstract ==
>> 
>> Apache Beam is an open source, unified model and set of
>> language-specific SDKs for defining and executing data processing
>> workflows, and also data ingestion and integration flows, supporting
>> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
>> (DSLs). Dataflow pipelines simplify the mechanics of large-scale
>>> batch
>> and streaming data processing and can run on a number of runtimes
>>> like
>> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
 service).
>> Beam also brings DSL in different languages, allowing users to
>>easily
>> implement their data integration processes.
>> 
>> == Proposal ==
>> 
>> Beam is a simple, flexible, and powerful system for distributed data
>> processing at any scale. Beam provides a unified programming model,
>>a
>> software development kit to define and construct data processing
>> pipelines, and runners to execute Beam pipelines in several runtime
>> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow.
 Beam
>> can be used for a variety of streaming or batch data processing
>>goals
>> including ETL, stream analysis, and aggregate computation. The
>> underlying programming model for Beam provides MapReduce-like
>> parallelism, combined with support for powerful data windowing, and
>> fine-grained correctness control.
>> 
>> == Background ==
>> 
>> Beam started as a set of Google projects (Google Cloud Dataflow)
 focused
>> on making data processing easier, faster, and less costly. The Beam
>> model is a successor to MapReduce, FlumeJava, and Millwheel inside
>> Google and is focused on providing a unified solution for batch and
>> stream processing. These projects on which Beam is based have been
>> published in several papers made available to the public:
>> 
>>  * MapReduce - http://research.google.com/archive/mapreduce.html
>>  * Dataflow model  -
>>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>>  * MillWheel - http://research.google.com/pubs/pub41378.html
>> 
>> Beam was designed from the start to provide a portable programming
>> layer. When you define a data processing pipeline with the Beam
>>> model,
>> you are creating a job which is capable of being processed by any
 number
>> of Beam processing engines. Several engines have been developed to
>>> run
>> Beam pipelines in other open source runtimes, including a Beam
>>runner
>> for Apache Flink and Apache Spark. There is also a ³direct runner²,
>>> for
>> execution on the developer machine (mainly for dev/debug purposes).
>> Another runner allows a Beam program to run on a managed service,
 Google
>> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
>> already available on GitHub, and independent from the Google Cloud
>> Dataflow service. Another Python SDK is currently in active
 development.
>> 
>> In this proposal, the Beam SDKs, model, and a set of runners will be
>> submitted as an OSS project under the ASF. The runners which are a
>>> part
>> of this proposal include those for Spark (from Cloudera), Flink
>>(from
>> data Artisans), and local developme

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Julian Hyde
+1 (binding)

> On Jan 28, 2016, at 10:42 AM, Mayank Bansal  wrote:
> 
> +1 (non-binding)
> 
> Thanks,
> Mayank
> 
> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh <
> venkat...@innerzeal.com> wrote:
> 
>> +1 (binding).
>> 
>> Thanks!
>> 
>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning 
>> wrote:
>> 
>>> +1
>>> 
>>> 
>>> 
>>> On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament 
>>> wrote:
>>> 
 +1
 
 On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré 
 wrote:
 
> Hi,
> 
> the Beam proposal (initially Dataflow) was proposed last week.
> 
> The complete discussion thread is available here:
> 
> 
> 
 
>>> 
>> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> 
> As reminder the BeamProposal is here:
> 
> https://wiki.apache.org/incubator/BeamProposal
> 
> Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
> 
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
> 
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
> 
> == Abstract ==
> 
> Apache Beam is an open source, unified model and set of
> language-specific SDKs for defining and executing data processing
> workflows, and also data ingestion and integration flows, supporting
> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> (DSLs). Dataflow pipelines simplify the mechanics of large-scale
>> batch
> and streaming data processing and can run on a number of runtimes
>> like
> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
>>> service).
> Beam also brings DSL in different languages, allowing users to easily
> implement their data integration processes.
> 
> == Proposal ==
> 
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing
> pipelines, and runners to execute Beam pipelines in several runtime
> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow.
>>> Beam
> can be used for a variety of streaming or batch data processing goals
> including ETL, stream analysis, and aggregate computation. The
> underlying programming model for Beam provides MapReduce-like
> parallelism, combined with support for powerful data windowing, and
> fine-grained correctness control.
> 
> == Background ==
> 
> Beam started as a set of Google projects (Google Cloud Dataflow)
>>> focused
> on making data processing easier, faster, and less costly. The Beam
> model is a successor to MapReduce, FlumeJava, and Millwheel inside
> Google and is focused on providing a unified solution for batch and
> stream processing. These projects on which Beam is based have been
> published in several papers made available to the public:
> 
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  -
>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
> 
> Beam was designed from the start to provide a portable programming
> layer. When you define a data processing pipeline with the Beam
>> model,
> you are creating a job which is capable of being processed by any
>>> number
> of Beam processing engines. Several engines have been developed to
>> run
> Beam pipelines in other open source runtimes, including a Beam runner
> for Apache Flink and Apache Spark. There is also a “direct runner”,
>> for
> execution on the developer machine (mainly for dev/debug purposes).
> Another runner allows a Beam program to run on a managed service,
>>> Google
> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
> already available on GitHub, and independent from the Google Cloud
> Dataflow service. Another Python SDK is currently in active
>>> development.
> 
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a
>> part
> of this proposal include those for Spark (from Cloudera), Flink (from
> data Artisans), and local development (from Google); the Google Cloud
> Dataflow service runner is not included in this proposal. Further
> references to Beam will refer to the Dataflow model, SDKs, and
>> runners
> which are a part of this proposal (Apache Beam) only. The

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Mayank Bansal
+1 (non-binding)

Thanks,
Mayank

On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh <
venkat...@innerzeal.com> wrote:

> +1 (binding).
>
> Thanks!
>
> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning 
> wrote:
>
> > +1
> >
> >
> >
> > On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament 
> > wrote:
> >
> > > +1
> > >
> > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > the Beam proposal (initially Dataflow) was proposed last week.
> > > >
> > > > The complete discussion thread is available here:
> > > >
> > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> > > >
> > > > As reminder the BeamProposal is here:
> > > >
> > > > https://wiki.apache.org/incubator/BeamProposal
> > > >
> > > > Regarding all the great feedbacks we received on the mailing list, we
> > > > think it's time to call a vote to accept Beam into the Incubator.
> > > >
> > > > Please cast your vote to:
> > > > [] +1 - accept Apache Beam as a new incubating project
> > > > []  0 - not sure
> > > > [] -1 - do not accept the Apache Beam project (because: ...)
> > > >
> > > > Thanks,
> > > > Regards
> > > > JB
> > > > 
> > > > ## page was renamed from DataflowProposal
> > > > = Apache Beam =
> > > >
> > > > == Abstract ==
> > > >
> > > > Apache Beam is an open source, unified model and set of
> > > > language-specific SDKs for defining and executing data processing
> > > > workflows, and also data ingestion and integration flows, supporting
> > > > Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> > > > (DSLs). Dataflow pipelines simplify the mechanics of large-scale
> batch
> > > > and streaming data processing and can run on a number of runtimes
> like
> > > > Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
> > service).
> > > > Beam also brings DSL in different languages, allowing users to easily
> > > > implement their data integration processes.
> > > >
> > > > == Proposal ==
> > > >
> > > > Beam is a simple, flexible, and powerful system for distributed data
> > > > processing at any scale. Beam provides a unified programming model, a
> > > > software development kit to define and construct data processing
> > > > pipelines, and runners to execute Beam pipelines in several runtime
> > > > engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow.
> > Beam
> > > > can be used for a variety of streaming or batch data processing goals
> > > > including ETL, stream analysis, and aggregate computation. The
> > > > underlying programming model for Beam provides MapReduce-like
> > > > parallelism, combined with support for powerful data windowing, and
> > > > fine-grained correctness control.
> > > >
> > > > == Background ==
> > > >
> > > > Beam started as a set of Google projects (Google Cloud Dataflow)
> > focused
> > > > on making data processing easier, faster, and less costly. The Beam
> > > > model is a successor to MapReduce, FlumeJava, and Millwheel inside
> > > > Google and is focused on providing a unified solution for batch and
> > > > stream processing. These projects on which Beam is based have been
> > > > published in several papers made available to the public:
> > > >
> > > >   * MapReduce - http://research.google.com/archive/mapreduce.html
> > > >   * Dataflow model  -
> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> > > >   * FlumeJava - http://research.google.com/pubs/pub35650.html
> > > >   * MillWheel - http://research.google.com/pubs/pub41378.html
> > > >
> > > > Beam was designed from the start to provide a portable programming
> > > > layer. When you define a data processing pipeline with the Beam
> model,
> > > > you are creating a job which is capable of being processed by any
> > number
> > > > of Beam processing engines. Several engines have been developed to
> run
> > > > Beam pipelines in other open source runtimes, including a Beam runner
> > > > for Apache Flink and Apache Spark. There is also a “direct runner”,
> for
> > > > execution on the developer machine (mainly for dev/debug purposes).
> > > > Another runner allows a Beam program to run on a managed service,
> > Google
> > > > Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
> > > > already available on GitHub, and independent from the Google Cloud
> > > > Dataflow service. Another Python SDK is currently in active
> > development.
> > > >
> > > > In this proposal, the Beam SDKs, model, and a set of runners will be
> > > > submitted as an OSS project under the ASF. The runners which are a
> part
> > > > of this proposal include those for Spark (from Cloudera), Flink (from
> > > > data Artisans), and local development (from Google); the Google Cloud
> > > > Dataflow service runner is not included in this proposal. Further
> > > > references to Beam will refer to the Dataflow model, SDKs, and
> runners
> > > > which ar

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Seetharam Venkatesh
+1 (binding).

Thanks!

On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning  wrote:

> +1
>
>
>
> On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament 
> wrote:
>
> > +1
> >
> > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré 
> > wrote:
> >
> > > Hi,
> > >
> > > the Beam proposal (initially Dataflow) was proposed last week.
> > >
> > > The complete discussion thread is available here:
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> > >
> > > As reminder the BeamProposal is here:
> > >
> > > https://wiki.apache.org/incubator/BeamProposal
> > >
> > > Regarding all the great feedbacks we received on the mailing list, we
> > > think it's time to call a vote to accept Beam into the Incubator.
> > >
> > > Please cast your vote to:
> > > [] +1 - accept Apache Beam as a new incubating project
> > > []  0 - not sure
> > > [] -1 - do not accept the Apache Beam project (because: ...)
> > >
> > > Thanks,
> > > Regards
> > > JB
> > > 
> > > ## page was renamed from DataflowProposal
> > > = Apache Beam =
> > >
> > > == Abstract ==
> > >
> > > Apache Beam is an open source, unified model and set of
> > > language-specific SDKs for defining and executing data processing
> > > workflows, and also data ingestion and integration flows, supporting
> > > Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> > > (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
> > > and streaming data processing and can run on a number of runtimes like
> > > Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
> service).
> > > Beam also brings DSL in different languages, allowing users to easily
> > > implement their data integration processes.
> > >
> > > == Proposal ==
> > >
> > > Beam is a simple, flexible, and powerful system for distributed data
> > > processing at any scale. Beam provides a unified programming model, a
> > > software development kit to define and construct data processing
> > > pipelines, and runners to execute Beam pipelines in several runtime
> > > engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow.
> Beam
> > > can be used for a variety of streaming or batch data processing goals
> > > including ETL, stream analysis, and aggregate computation. The
> > > underlying programming model for Beam provides MapReduce-like
> > > parallelism, combined with support for powerful data windowing, and
> > > fine-grained correctness control.
> > >
> > > == Background ==
> > >
> > > Beam started as a set of Google projects (Google Cloud Dataflow)
> focused
> > > on making data processing easier, faster, and less costly. The Beam
> > > model is a successor to MapReduce, FlumeJava, and Millwheel inside
> > > Google and is focused on providing a unified solution for batch and
> > > stream processing. These projects on which Beam is based have been
> > > published in several papers made available to the public:
> > >
> > >   * MapReduce - http://research.google.com/archive/mapreduce.html
> > >   * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> > >   * FlumeJava - http://research.google.com/pubs/pub35650.html
> > >   * MillWheel - http://research.google.com/pubs/pub41378.html
> > >
> > > Beam was designed from the start to provide a portable programming
> > > layer. When you define a data processing pipeline with the Beam model,
> > > you are creating a job which is capable of being processed by any
> number
> > > of Beam processing engines. Several engines have been developed to run
> > > Beam pipelines in other open source runtimes, including a Beam runner
> > > for Apache Flink and Apache Spark. There is also a “direct runner”, for
> > > execution on the developer machine (mainly for dev/debug purposes).
> > > Another runner allows a Beam program to run on a managed service,
> Google
> > > Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
> > > already available on GitHub, and independent from the Google Cloud
> > > Dataflow service. Another Python SDK is currently in active
> development.
> > >
> > > In this proposal, the Beam SDKs, model, and a set of runners will be
> > > submitted as an OSS project under the ASF. The runners which are a part
> > > of this proposal include those for Spark (from Cloudera), Flink (from
> > > data Artisans), and local development (from Google); the Google Cloud
> > > Dataflow service runner is not included in this proposal. Further
> > > references to Beam will refer to the Dataflow model, SDKs, and runners
> > > which are a part of this proposal (Apache Beam) only. The initial
> > > submission will contain the already-released Java SDK; Google intends
> to
> > > submit the Python SDK later in the incubation process. The Google Cloud
> > > Dataflow service will continue to be one of many runners for Beam,
> built
> > > on Google Cloud Platform, to run Beam pipelines. Necessarily

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Ted Dunning
+1



On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament 
wrote:

> +1
>
> On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré 
> wrote:
>
> > Hi,
> >
> > the Beam proposal (initially Dataflow) was proposed last week.
> >
> > The complete discussion thread is available here:
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> >
> > As reminder the BeamProposal is here:
> >
> > https://wiki.apache.org/incubator/BeamProposal
> >
> > Regarding all the great feedbacks we received on the mailing list, we
> > think it's time to call a vote to accept Beam into the Incubator.
> >
> > Please cast your vote to:
> > [] +1 - accept Apache Beam as a new incubating project
> > []  0 - not sure
> > [] -1 - do not accept the Apache Beam project (because: ...)
> >
> > Thanks,
> > Regards
> > JB
> > 
> > ## page was renamed from DataflowProposal
> > = Apache Beam =
> >
> > == Abstract ==
> >
> > Apache Beam is an open source, unified model and set of
> > language-specific SDKs for defining and executing data processing
> > workflows, and also data ingestion and integration flows, supporting
> > Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> > (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
> > and streaming data processing and can run on a number of runtimes like
> > Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
> > Beam also brings DSL in different languages, allowing users to easily
> > implement their data integration processes.
> >
> > == Proposal ==
> >
> > Beam is a simple, flexible, and powerful system for distributed data
> > processing at any scale. Beam provides a unified programming model, a
> > software development kit to define and construct data processing
> > pipelines, and runners to execute Beam pipelines in several runtime
> > engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam
> > can be used for a variety of streaming or batch data processing goals
> > including ETL, stream analysis, and aggregate computation. The
> > underlying programming model for Beam provides MapReduce-like
> > parallelism, combined with support for powerful data windowing, and
> > fine-grained correctness control.
> >
> > == Background ==
> >
> > Beam started as a set of Google projects (Google Cloud Dataflow) focused
> > on making data processing easier, faster, and less costly. The Beam
> > model is a successor to MapReduce, FlumeJava, and Millwheel inside
> > Google and is focused on providing a unified solution for batch and
> > stream processing. These projects on which Beam is based have been
> > published in several papers made available to the public:
> >
> >   * MapReduce - http://research.google.com/archive/mapreduce.html
> >   * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> >   * FlumeJava - http://research.google.com/pubs/pub35650.html
> >   * MillWheel - http://research.google.com/pubs/pub41378.html
> >
> > Beam was designed from the start to provide a portable programming
> > layer. When you define a data processing pipeline with the Beam model,
> > you are creating a job which is capable of being processed by any number
> > of Beam processing engines. Several engines have been developed to run
> > Beam pipelines in other open source runtimes, including a Beam runner
> > for Apache Flink and Apache Spark. There is also a “direct runner”, for
> > execution on the developer machine (mainly for dev/debug purposes).
> > Another runner allows a Beam program to run on a managed service, Google
> > Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
> > already available on GitHub, and independent from the Google Cloud
> > Dataflow service. Another Python SDK is currently in active development.
> >
> > In this proposal, the Beam SDKs, model, and a set of runners will be
> > submitted as an OSS project under the ASF. The runners which are a part
> > of this proposal include those for Spark (from Cloudera), Flink (from
> > data Artisans), and local development (from Google); the Google Cloud
> > Dataflow service runner is not included in this proposal. Further
> > references to Beam will refer to the Dataflow model, SDKs, and runners
> > which are a part of this proposal (Apache Beam) only. The initial
> > submission will contain the already-released Java SDK; Google intends to
> > submit the Python SDK later in the incubation process. The Google Cloud
> > Dataflow service will continue to be one of many runners for Beam, built
> > on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud
> > Dataflow will develop against the Apache project additions, updates, and
> > changes. Google Cloud Dataflow will become one user of Apache Beam and
> > will participate in the project openly and publicly.
> >
> > The Beam programming model has been designed with simplicity,
> > scalabi

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread John D. Ament
+1

On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of
> language-specific SDKs for defining and executing data processing
> workflows, and also data ingestion and integration flows, supporting
> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
> and streaming data processing and can run on a number of runtimes like
> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
> Beam also brings DSL in different languages, allowing users to easily
> implement their data integration processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing
> pipelines, and runners to execute Beam pipelines in several runtime
> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam
> can be used for a variety of streaming or batch data processing goals
> including ETL, stream analysis, and aggregate computation. The
> underlying programming model for Beam provides MapReduce-like
> parallelism, combined with support for powerful data windowing, and
> fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam
> model is a successor to MapReduce, FlumeJava, and Millwheel inside
> Google and is focused on providing a unified solution for batch and
> stream processing. These projects on which Beam is based have been
> published in several papers made available to the public:
>
>   * MapReduce - http://research.google.com/archive/mapreduce.html
>   * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>   * FlumeJava - http://research.google.com/pubs/pub35650.html
>   * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming
> layer. When you define a data processing pipeline with the Beam model,
> you are creating a job which is capable of being processed by any number
> of Beam processing engines. Several engines have been developed to run
> Beam pipelines in other open source runtimes, including a Beam runner
> for Apache Flink and Apache Spark. There is also a “direct runner”, for
> execution on the developer machine (mainly for dev/debug purposes).
> Another runner allows a Beam program to run on a managed service, Google
> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
> already available on GitHub, and independent from the Google Cloud
> Dataflow service. Another Python SDK is currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part
> of this proposal include those for Spark (from Cloudera), Flink (from
> data Artisans), and local development (from Google); the Google Cloud
> Dataflow service runner is not included in this proposal. Further
> references to Beam will refer to the Dataflow model, SDKs, and runners
> which are a part of this proposal (Apache Beam) only. The initial
> submission will contain the already-released Java SDK; Google intends to
> submit the Python SDK later in the incubation process. The Google Cloud
> Dataflow service will continue to be one of many runners for Beam, built
> on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud
> Dataflow will develop against the Apache project additions, updates, and
> changes. Google Cloud Dataflow will become one user of Apache Beam and
> will participate in the project openly and publicly.
>
> The Beam programming model has been designed with simplicity,
> scalability, and speed as key tenants. In the Beam model, you only need
> to think about four top-level concepts when constructing your data
> processing job:
>
>   * Pipelines - The data processing job made of a series of computations
> including input, processing, and o

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread James Taylor
+1 (binding)

On Thu, Jan 28, 2016 at 6:28 AM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> the mechanics of large-scale batch and streaming data processing and can
> run on a number of runtimes like Apache Flink, Apache Spark, and Google
> Cloud Dataflow (a cloud service). Beam also brings DSL in different
> languages, allowing users to easily implement their data integration
> processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
> a variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for
> powerful data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam model
> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner
> allows a Beam program to run on a managed service, Google Cloud Dataflow,
> in Google Cloud Platform. The Dataflow Java SDK is already available on
> GitHub, and independent from the Google Cloud Dataflow service. Another
> Python SDK is currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of
> this proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series of computations
> including input, processing, and

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Chris Nauroth
+1 (binding)

--Chris Nauroth




On 1/28/16, 6:28 AM, "Jean-Baptiste Onofré"  wrote:

>Hi,
>
>the Beam proposal (initially Dataflow) was proposed last week.
>
>The complete discussion thread is available here:
>
>http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3C
>CA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3
>E
>
>As reminder the BeamProposal is here:
>
>https://wiki.apache.org/incubator/BeamProposal
>
>Regarding all the great feedbacks we received on the mailing list, we
>think it's time to call a vote to accept Beam into the Incubator.
>
>Please cast your vote to:
>[] +1 - accept Apache Beam as a new incubating project
>[]  0 - not sure
>[] -1 - do not accept the Apache Beam project (because: ...)
>
>Thanks,
>Regards
>JB
>
>## page was renamed from DataflowProposal
>= Apache Beam =
>
>== Abstract ==
>
>Apache Beam is an open source, unified model and set of
>language-specific SDKs for defining and executing data processing
>workflows, and also data ingestion and integration flows, supporting
>Enterprise Integration Patterns (EIPs) and Domain Specific Languages
>(DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
>and streaming data processing and can run on a number of runtimes like
>Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
>Beam also brings DSL in different languages, allowing users to easily
>implement their data integration processes.
>
>== Proposal ==
>
>Beam is a simple, flexible, and powerful system for distributed data
>processing at any scale. Beam provides a unified programming model, a
>software development kit to define and construct data processing
>pipelines, and runners to execute Beam pipelines in several runtime
>engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam
>can be used for a variety of streaming or batch data processing goals
>including ETL, stream analysis, and aggregate computation. The
>underlying programming model for Beam provides MapReduce-like
>parallelism, combined with support for powerful data windowing, and
>fine-grained correctness control.
>
>== Background ==
>
>Beam started as a set of Google projects (Google Cloud Dataflow) focused
>on making data processing easier, faster, and less costly. The Beam
>model is a successor to MapReduce, FlumeJava, and Millwheel inside
>Google and is focused on providing a unified solution for batch and
>stream processing. These projects on which Beam is based have been
>published in several papers made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
>Beam was designed from the start to provide a portable programming
>layer. When you define a data processing pipeline with the Beam model,
>you are creating a job which is capable of being processed by any number
>of Beam processing engines. Several engines have been developed to run
>Beam pipelines in other open source runtimes, including a Beam runner
>for Apache Flink and Apache Spark. There is also a ³direct runner², for
>execution on the developer machine (mainly for dev/debug purposes).
>Another runner allows a Beam program to run on a managed service, Google
>Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
>already available on GitHub, and independent from the Google Cloud
>Dataflow service. Another Python SDK is currently in active development.
>
>In this proposal, the Beam SDKs, model, and a set of runners will be
>submitted as an OSS project under the ASF. The runners which are a part
>of this proposal include those for Spark (from Cloudera), Flink (from
>data Artisans), and local development (from Google); the Google Cloud
>Dataflow service runner is not included in this proposal. Further
>references to Beam will refer to the Dataflow model, SDKs, and runners
>which are a part of this proposal (Apache Beam) only. The initial
>submission will contain the already-released Java SDK; Google intends to
>submit the Python SDK later in the incubation process. The Google Cloud
>Dataflow service will continue to be one of many runners for Beam, built
>on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud
>Dataflow will develop against the Apache project additions, updates, and
>changes. Google Cloud Dataflow will become one user of Apache Beam and
>will participate in the project openly and publicly.
>
>The Beam programming model has been designed with simplicity,
>scalability, and speed as key tenants. In the Beam model, you only need
>to think about four top-level concepts when constructing your data
>processing job:
>
>  * Pipelines - The data processing job made of a series of computations
>including input, processing, and output
>  * PCollections - Bounded (or unbounded) datasets 

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Andreas Neumann
+1 (non-binding).

-Andreas

On Thu, Jan 28, 2016 at 9:28 AM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> the mechanics of large-scale batch and streaming data processing and can
> run on a number of runtimes like Apache Flink, Apache Spark, and Google
> Cloud Dataflow (a cloud service). Beam also brings DSL in different
> languages, allowing users to easily implement their data integration
> processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
> a variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for
> powerful data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam model
> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner
> allows a Beam program to run on a managed service, Google Cloud Dataflow,
> in Google Cloud Platform. The Dataflow Java SDK is already available on
> GitHub, and independent from the Google Cloud Dataflow service. Another
> Python SDK is currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of
> this proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series of computations
> including input, 

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread markus.geiss


+1 (non-binding)


Best,


Markus


.:: YAGNI likes a DRY KISS ::.






On Thu, Jan 28, 2016 at 8:51 AM -0800, "Ashish"  wrote:





+1 (non-binding)

On Thu, Jan 28, 2016 at 6:28 AM, Jean-Baptiste Onofré  wrote:
> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we think
> it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the
> mechanics of large-scale batch and streaming data processing and can run on
> a number of runtimes like Apache Flink, Apache Spark, and Google Cloud
> Dataflow (a cloud service). Beam also brings DSL in different languages,
> allowing users to easily implement their data integration processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a
> variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for powerful
> data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused on
> making data processing easier, faster, and less costly. The Beam model is a
> successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner allows
> a Beam program to run on a managed service, Google Cloud Dataflow, in Google
> Cloud Platform. The Dataflow Java SDK is already available on GitHub, and
> independent from the Google Cloud Dataflow service. Another Python SDK is
> currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of this
> proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run Beam
> pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing y

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Prasanth Jayachandran
+1

Thanks
Prasanth
> On Jan 28, 2016, at 10:45 AM, Supun Kamburugamuva  wrote:
> 
> +1
> 
> Supun..
> 
> On Thu, Jan 28, 2016 at 11:43 AM, Daniel Kulp  wrote:
> 
>> +1
>> 
>> Dan
>> 
>> 
>> 
>>> On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> the Beam proposal (initially Dataflow) was proposed last week.
>>> 
>>> The complete discussion thread is available here:
>>> 
>>> 
>> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>>> 
>>> As reminder the BeamProposal is here:
>>> 
>>> https://wiki.apache.org/incubator/BeamProposal
>>> 
>>> Regarding all the great feedbacks we received on the mailing list, we
>> think it's time to call a vote to accept Beam into the Incubator.
>>> 
>>> Please cast your vote to:
>>> [] +1 - accept Apache Beam as a new incubating project
>>> []  0 - not sure
>>> [] -1 - do not accept the Apache Beam project (because: ...)
>>> 
>>> Thanks,
>>> Regards
>>> JB
>>> 
>>> ## page was renamed from DataflowProposal
>>> = Apache Beam =
>>> 
>>> == Abstract ==
>>> 
>>> Apache Beam is an open source, unified model and set of
>> language-specific SDKs for defining and executing data processing
>> workflows, and also data ingestion and integration flows, supporting
>> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
>> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and
>> streaming data processing and can run on a number of runtimes like Apache
>> Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also
>> brings DSL in different languages, allowing users to easily implement their
>> data integration processes.
>>> 
>>> == Proposal ==
>>> 
>>> Beam is a simple, flexible, and powerful system for distributed data
>> processing at any scale. Beam provides a unified programming model, a
>> software development kit to define and construct data processing pipelines,
>> and runners to execute Beam pipelines in several runtime engines, like
>> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
>> a variety of streaming or batch data processing goals including ETL, stream
>> analysis, and aggregate computation. The underlying programming model for
>> Beam provides MapReduce-like parallelism, combined with support for
>> powerful data windowing, and fine-grained correctness control.
>>> 
>>> == Background ==
>>> 
>>> Beam started as a set of Google projects (Google Cloud Dataflow) focused
>> on making data processing easier, faster, and less costly. The Beam model
>> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
>> focused on providing a unified solution for batch and stream processing.
>> These projects on which Beam is based have been published in several papers
>> made available to the public:
>>> 
>>> * MapReduce - http://research.google.com/archive/mapreduce.html
>>> * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>>> * FlumeJava - http://research.google.com/pubs/pub35650.html
>>> * MillWheel - http://research.google.com/pubs/pub41378.html
>>> 
>>> Beam was designed from the start to provide a portable programming
>> layer. When you define a data processing pipeline with the Beam model, you
>> are creating a job which is capable of being processed by any number of
>> Beam processing engines. Several engines have been developed to run Beam
>> pipelines in other open source runtimes, including a Beam runner for Apache
>> Flink and Apache Spark. There is also a “direct runner”, for execution on
>> the developer machine (mainly for dev/debug purposes). Another runner
>> allows a Beam program to run on a managed service, Google Cloud Dataflow,
>> in Google Cloud Platform. The Dataflow Java SDK is already available on
>> GitHub, and independent from the Google Cloud Dataflow service. Another
>> Python SDK is currently in active development.
>>> 
>>> In this proposal, the Beam SDKs, model, and a set of runners will be
>> submitted as an OSS project under the ASF. The runners which are a part of
>> this proposal include those for Spark (from Cloudera), Flink (from data
>> Artisans), and local development (from Google); the Google Cloud Dataflow
>> service runner is not included in this proposal. Further references to Beam
>> will refer to the Dataflow model, SDKs, and runners which are a part of
>> this proposal (Apache Beam) only. The initial submission will contain the
>> already-released Java SDK; Google intends to submit the Python SDK later in
>> the incubation process. The Google Cloud Dataflow service will continue to
>> be one of many runners for Beam, built on Google Cloud Platform, to run
>> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
>> project additions, updates, and changes. Google Cloud Dataflow will become
>> one user of Apache Beam and will participate in the project openly and
>> publicly

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Ashish
+1 (non-binding)

On Thu, Jan 28, 2016 at 6:28 AM, Jean-Baptiste Onofré  wrote:
> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we think
> it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the
> mechanics of large-scale batch and streaming data processing and can run on
> a number of runtimes like Apache Flink, Apache Spark, and Google Cloud
> Dataflow (a cloud service). Beam also brings DSL in different languages,
> allowing users to easily implement their data integration processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a
> variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for powerful
> data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused on
> making data processing easier, faster, and less costly. The Beam model is a
> successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner allows
> a Beam program to run on a managed service, Google Cloud Dataflow, in Google
> Cloud Platform. The Dataflow Java SDK is already available on GitHub, and
> independent from the Google Cloud Dataflow service. Another Python SDK is
> currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of this
> proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run Beam
> pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series of computations
> including input, processing, and 

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Supun Kamburugamuva
+1

Supun..

On Thu, Jan 28, 2016 at 11:43 AM, Daniel Kulp  wrote:

> +1
>
> Dan
>
>
>
> > On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré 
> wrote:
> >
> > Hi,
> >
> > the Beam proposal (initially Dataflow) was proposed last week.
> >
> > The complete discussion thread is available here:
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> >
> > As reminder the BeamProposal is here:
> >
> > https://wiki.apache.org/incubator/BeamProposal
> >
> > Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
> >
> > Please cast your vote to:
> > [] +1 - accept Apache Beam as a new incubating project
> > []  0 - not sure
> > [] -1 - do not accept the Apache Beam project (because: ...)
> >
> > Thanks,
> > Regards
> > JB
> > 
> > ## page was renamed from DataflowProposal
> > = Apache Beam =
> >
> > == Abstract ==
> >
> > Apache Beam is an open source, unified model and set of
> language-specific SDKs for defining and executing data processing
> workflows, and also data ingestion and integration flows, supporting
> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and
> streaming data processing and can run on a number of runtimes like Apache
> Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also
> brings DSL in different languages, allowing users to easily implement their
> data integration processes.
> >
> > == Proposal ==
> >
> > Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
> a variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for
> powerful data windowing, and fine-grained correctness control.
> >
> > == Background ==
> >
> > Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam model
> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
> >
> > * MapReduce - http://research.google.com/archive/mapreduce.html
> > * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> > * FlumeJava - http://research.google.com/pubs/pub35650.html
> > * MillWheel - http://research.google.com/pubs/pub41378.html
> >
> > Beam was designed from the start to provide a portable programming
> layer. When you define a data processing pipeline with the Beam model, you
> are creating a job which is capable of being processed by any number of
> Beam processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner
> allows a Beam program to run on a managed service, Google Cloud Dataflow,
> in Google Cloud Platform. The Dataflow Java SDK is already available on
> GitHub, and independent from the Google Cloud Dataflow service. Another
> Python SDK is currently in active development.
> >
> > In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of
> this proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
> >
> > The Beam programming model has been designed with simplicity,
> scalability, and speed as key tenants. In the Beam model, you only need to
> think about four top

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Daniel Kulp
+1

Dan



> On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré  wrote:
> 
> Hi,
> 
> the Beam proposal (initially Dataflow) was proposed last week.
> 
> The complete discussion thread is available here:
> 
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> 
> As reminder the BeamProposal is here:
> 
> https://wiki.apache.org/incubator/BeamProposal
> 
> Regarding all the great feedbacks we received on the mailing list, we think 
> it's time to call a vote to accept Beam into the Incubator.
> 
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
> 
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
> 
> == Abstract ==
> 
> Apache Beam is an open source, unified model and set of language-specific 
> SDKs for defining and executing data processing workflows, and also data 
> ingestion and integration flows, supporting Enterprise Integration Patterns 
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the 
> mechanics of large-scale batch and streaming data processing and can run on a 
> number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow 
> (a cloud service). Beam also brings DSL in different languages, allowing 
> users to easily implement their data integration processes.
> 
> == Proposal ==
> 
> Beam is a simple, flexible, and powerful system for distributed data 
> processing at any scale. Beam provides a unified programming model, a 
> software development kit to define and construct data processing pipelines, 
> and runners to execute Beam pipelines in several runtime engines, like Apache 
> Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety 
> of streaming or batch data processing goals including ETL, stream analysis, 
> and aggregate computation. The underlying programming model for Beam provides 
> MapReduce-like parallelism, combined with support for powerful data 
> windowing, and fine-grained correctness control.
> 
> == Background ==
> 
> Beam started as a set of Google projects (Google Cloud Dataflow) focused on 
> making data processing easier, faster, and less costly. The Beam model is a 
> successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused 
> on providing a unified solution for batch and stream processing. These 
> projects on which Beam is based have been published in several papers made 
> available to the public:
> 
> * MapReduce - http://research.google.com/archive/mapreduce.html
> * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> * FlumeJava - http://research.google.com/pubs/pub35650.html
> * MillWheel - http://research.google.com/pubs/pub41378.html
> 
> Beam was designed from the start to provide a portable programming layer. 
> When you define a data processing pipeline with the Beam model, you are 
> creating a job which is capable of being processed by any number of Beam 
> processing engines. Several engines have been developed to run Beam pipelines 
> in other open source runtimes, including a Beam runner for Apache Flink and 
> Apache Spark. There is also a “direct runner”, for execution on the developer 
> machine (mainly for dev/debug purposes). Another runner allows a Beam program 
> to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. 
> The Dataflow Java SDK is already available on GitHub, and independent from 
> the Google Cloud Dataflow service. Another Python SDK is currently in active 
> development.
> 
> In this proposal, the Beam SDKs, model, and a set of runners will be 
> submitted as an OSS project under the ASF. The runners which are a part of 
> this proposal include those for Spark (from Cloudera), Flink (from data 
> Artisans), and local development (from Google); the Google Cloud Dataflow 
> service runner is not included in this proposal. Further references to Beam 
> will refer to the Dataflow model, SDKs, and runners which are a part of this 
> proposal (Apache Beam) only. The initial submission will contain the 
> already-released Java SDK; Google intends to submit the Python SDK later in 
> the incubation process. The Google Cloud Dataflow service will continue to be 
> one of many runners for Beam, built on Google Cloud Platform, to run Beam 
> pipelines. Necessarily, Cloud Dataflow will develop against the Apache 
> project additions, updates, and changes. Google Cloud Dataflow will become 
> one user of Apache Beam and will participate in the project openly and 
> publicly.
> 
> The Beam programming model has been designed with simplicity, scalability, 
> and speed as key tenants. In the Beam model, you only need to think about 
> four top-level concepts when constructing your data processing job:
> 
> * Pipelines - The data processing job made of a seri

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Suresh Marru
+ 1 (binding).

Suresh

> On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré  wrote:
> 
> Hi,
> 
> the Beam proposal (initially Dataflow) was proposed last week.
> 
> The complete discussion thread is available here:
> 
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> 
> As reminder the BeamProposal is here:
> 
> https://wiki.apache.org/incubator/BeamProposal
> 
> Regarding all the great feedbacks we received on the mailing list, we think 
> it's time to call a vote to accept Beam into the Incubator.
> 
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
> 
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
> 
> == Abstract ==
> 
> Apache Beam is an open source, unified model and set of language-specific 
> SDKs for defining and executing data processing workflows, and also data 
> ingestion and integration flows, supporting Enterprise Integration Patterns 
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the 
> mechanics of large-scale batch and streaming data processing and can run on a 
> number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow 
> (a cloud service). Beam also brings DSL in different languages, allowing 
> users to easily implement their data integration processes.
> 
> == Proposal ==
> 
> Beam is a simple, flexible, and powerful system for distributed data 
> processing at any scale. Beam provides a unified programming model, a 
> software development kit to define and construct data processing pipelines, 
> and runners to execute Beam pipelines in several runtime engines, like Apache 
> Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety 
> of streaming or batch data processing goals including ETL, stream analysis, 
> and aggregate computation. The underlying programming model for Beam provides 
> MapReduce-like parallelism, combined with support for powerful data 
> windowing, and fine-grained correctness control.
> 
> == Background ==
> 
> Beam started as a set of Google projects (Google Cloud Dataflow) focused on 
> making data processing easier, faster, and less costly. The Beam model is a 
> successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused 
> on providing a unified solution for batch and stream processing. These 
> projects on which Beam is based have been published in several papers made 
> available to the public:
> 
> * MapReduce - http://research.google.com/archive/mapreduce.html
> * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> * FlumeJava - http://research.google.com/pubs/pub35650.html
> * MillWheel - http://research.google.com/pubs/pub41378.html
> 
> Beam was designed from the start to provide a portable programming layer. 
> When you define a data processing pipeline with the Beam model, you are 
> creating a job which is capable of being processed by any number of Beam 
> processing engines. Several engines have been developed to run Beam pipelines 
> in other open source runtimes, including a Beam runner for Apache Flink and 
> Apache Spark. There is also a “direct runner”, for execution on the developer 
> machine (mainly for dev/debug purposes). Another runner allows a Beam program 
> to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. 
> The Dataflow Java SDK is already available on GitHub, and independent from 
> the Google Cloud Dataflow service. Another Python SDK is currently in active 
> development.
> 
> In this proposal, the Beam SDKs, model, and a set of runners will be 
> submitted as an OSS project under the ASF. The runners which are a part of 
> this proposal include those for Spark (from Cloudera), Flink (from data 
> Artisans), and local development (from Google); the Google Cloud Dataflow 
> service runner is not included in this proposal. Further references to Beam 
> will refer to the Dataflow model, SDKs, and runners which are a part of this 
> proposal (Apache Beam) only. The initial submission will contain the 
> already-released Java SDK; Google intends to submit the Python SDK later in 
> the incubation process. The Google Cloud Dataflow service will continue to be 
> one of many runners for Beam, built on Google Cloud Platform, to run Beam 
> pipelines. Necessarily, Cloud Dataflow will develop against the Apache 
> project additions, updates, and changes. Google Cloud Dataflow will become 
> one user of Apache Beam and will participate in the project openly and 
> publicly.
> 
> The Beam programming model has been designed with simplicity, scalability, 
> and speed as key tenants. In the Beam model, you only need to think about 
> four top-level concepts when constructing your data processing job:
> 
> * Pipelines - The data processing job m

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Felix Cheung
+1 (non-binding)

On Thu, Jan 28, 2016 at 8:08 AM Jim Jagielski  wrote:

> +1 (binding)
>
> > On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré 
> wrote:
> >
> > Hi,
> >
> > the Beam proposal (initially Dataflow) was proposed last week.
> >
> > The complete discussion thread is available here:
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> >
> > As reminder the BeamProposal is here:
> >
> > https://wiki.apache.org/incubator/BeamProposal
> >
> > Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
> >
> > Please cast your vote to:
> > [] +1 - accept Apache Beam as a new incubating project
> > []  0 - not sure
> > [] -1 - do not accept the Apache Beam project (because: ...)
> >
> > Thanks,
> > Regards
> > JB
> > 
> > ## page was renamed from DataflowProposal
> > = Apache Beam =
> >
> > == Abstract ==
> >
> > Apache Beam is an open source, unified model and set of
> language-specific SDKs for defining and executing data processing
> workflows, and also data ingestion and integration flows, supporting
> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and
> streaming data processing and can run on a number of runtimes like Apache
> Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also
> brings DSL in different languages, allowing users to easily implement their
> data integration processes.
> >
> > == Proposal ==
> >
> > Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
> a variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for
> powerful data windowing, and fine-grained correctness control.
> >
> > == Background ==
> >
> > Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam model
> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
> >
> > * MapReduce - http://research.google.com/archive/mapreduce.html
> > * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> > * FlumeJava - http://research.google.com/pubs/pub35650.html
> > * MillWheel - http://research.google.com/pubs/pub41378.html
> >
> > Beam was designed from the start to provide a portable programming
> layer. When you define a data processing pipeline with the Beam model, you
> are creating a job which is capable of being processed by any number of
> Beam processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner
> allows a Beam program to run on a managed service, Google Cloud Dataflow,
> in Google Cloud Platform. The Dataflow Java SDK is already available on
> GitHub, and independent from the Google Cloud Dataflow service. Another
> Python SDK is currently in active development.
> >
> > In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of
> this proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
> >
> > The Beam programming model has been designed with simplicity,
> scalability, and speed as key tenants. In the Beam model, you only need to
> think about four 

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Jim Jagielski
+1 (binding)

> On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré  wrote:
> 
> Hi,
> 
> the Beam proposal (initially Dataflow) was proposed last week.
> 
> The complete discussion thread is available here:
> 
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> 
> As reminder the BeamProposal is here:
> 
> https://wiki.apache.org/incubator/BeamProposal
> 
> Regarding all the great feedbacks we received on the mailing list, we think 
> it's time to call a vote to accept Beam into the Incubator.
> 
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
> 
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
> 
> == Abstract ==
> 
> Apache Beam is an open source, unified model and set of language-specific 
> SDKs for defining and executing data processing workflows, and also data 
> ingestion and integration flows, supporting Enterprise Integration Patterns 
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the 
> mechanics of large-scale batch and streaming data processing and can run on a 
> number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow 
> (a cloud service). Beam also brings DSL in different languages, allowing 
> users to easily implement their data integration processes.
> 
> == Proposal ==
> 
> Beam is a simple, flexible, and powerful system for distributed data 
> processing at any scale. Beam provides a unified programming model, a 
> software development kit to define and construct data processing pipelines, 
> and runners to execute Beam pipelines in several runtime engines, like Apache 
> Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety 
> of streaming or batch data processing goals including ETL, stream analysis, 
> and aggregate computation. The underlying programming model for Beam provides 
> MapReduce-like parallelism, combined with support for powerful data 
> windowing, and fine-grained correctness control.
> 
> == Background ==
> 
> Beam started as a set of Google projects (Google Cloud Dataflow) focused on 
> making data processing easier, faster, and less costly. The Beam model is a 
> successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused 
> on providing a unified solution for batch and stream processing. These 
> projects on which Beam is based have been published in several papers made 
> available to the public:
> 
> * MapReduce - http://research.google.com/archive/mapreduce.html
> * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> * FlumeJava - http://research.google.com/pubs/pub35650.html
> * MillWheel - http://research.google.com/pubs/pub41378.html
> 
> Beam was designed from the start to provide a portable programming layer. 
> When you define a data processing pipeline with the Beam model, you are 
> creating a job which is capable of being processed by any number of Beam 
> processing engines. Several engines have been developed to run Beam pipelines 
> in other open source runtimes, including a Beam runner for Apache Flink and 
> Apache Spark. There is also a “direct runner”, for execution on the developer 
> machine (mainly for dev/debug purposes). Another runner allows a Beam program 
> to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. 
> The Dataflow Java SDK is already available on GitHub, and independent from 
> the Google Cloud Dataflow service. Another Python SDK is currently in active 
> development.
> 
> In this proposal, the Beam SDKs, model, and a set of runners will be 
> submitted as an OSS project under the ASF. The runners which are a part of 
> this proposal include those for Spark (from Cloudera), Flink (from data 
> Artisans), and local development (from Google); the Google Cloud Dataflow 
> service runner is not included in this proposal. Further references to Beam 
> will refer to the Dataflow model, SDKs, and runners which are a part of this 
> proposal (Apache Beam) only. The initial submission will contain the 
> already-released Java SDK; Google intends to submit the Python SDK later in 
> the incubation process. The Google Cloud Dataflow service will continue to be 
> one of many runners for Beam, built on Google Cloud Platform, to run Beam 
> pipelines. Necessarily, Cloud Dataflow will develop against the Apache 
> project additions, updates, and changes. Google Cloud Dataflow will become 
> one user of Apache Beam and will participate in the project openly and 
> publicly.
> 
> The Beam programming model has been designed with simplicity, scalability, 
> and speed as key tenants. In the Beam model, you only need to think about 
> four top-level concepts when constructing your data processing job:
> 
> * Pipelines - The data processing job made of a s

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Sobkowiak Krzysztof
+1 (non-binding)

Regards
Krzysztof

On 28.01.2016 15:28, Jean-Baptiste Onofré wrote:
> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we think 
> it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific 
> SDKs for defining and executing data processing workflows, and also data 
> ingestion and integration flows, supporting Enterprise Integration Patterns 
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the 
> mechanics of large-scale batch and streaming data processing and can run on a 
> number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow 
> (a cloud service). Beam also brings DSL in different languages, allowing 
> users to easily implement their data integration processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data 
> processing at any scale. Beam provides a unified programming model, a 
> software development kit to define and construct data processing pipelines, 
> and runners to execute Beam pipelines in several runtime engines, like Apache 
> Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety 
> of streaming or batch data processing goals including ETL, stream analysis, 
> and aggregate computation. The underlying programming model for Beam provides 
> MapReduce-like parallelism, combined with support for powerful data 
> windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused on 
> making data processing easier, faster, and less costly. The Beam model is a 
> successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused 
> on providing a unified solution for batch and stream processing. These 
> projects on which Beam is based have been published in several papers made 
> available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer. 
> When you define a data processing pipeline with the Beam model, you are 
> creating a job which is capable of being processed by any number of Beam 
> processing engines. Several engines have been developed to run Beam pipelines 
> in other open source runtimes, including a Beam runner for Apache Flink and 
> Apache Spark. There is also a “direct runner”, for execution on the developer 
> machine (mainly for dev/debug purposes). Another runner allows a Beam program 
> to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. 
> The Dataflow Java SDK is already available on GitHub, and independent from 
> the Google Cloud Dataflow service. Another Python SDK is currently in active 
> development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be 
> submitted as an OSS project under the ASF. The runners which are a part of 
> this proposal include those for Spark (from Cloudera), Flink (from data 
> Artisans), and local development (from Google); the Google Cloud Dataflow 
> service runner is not included in this proposal. Further references to Beam 
> will refer to the Dataflow model, SDKs, and runners which are a part of this 
> proposal (Apache Beam) only. The initial submission will contain the 
> already-released Java SDK; Google intends to submit the Python SDK later in 
> the incubation process. The Google Cloud Dataflow service will continue to be 
> one of many runners for Beam, built on Google Cloud Platform, to run Beam 
> pipelines. Necessarily, Cloud Dataflow will develop against the Apache 
> project additions, updates, and changes. Google Cloud Dataflow will become 
> one user of Apache Beam and will participate in the project openly and 
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability, 
> and speed as key tenants. In the Beam model, you only need to think about 
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread P. Taylor Goetz
+1 (binding)

-Taylor

> On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré  wrote:
> 
> Hi,
> 
> the Beam proposal (initially Dataflow) was proposed last week.
> 
> The complete discussion thread is available here:
> 
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> 
> As reminder the BeamProposal is here:
> 
> https://wiki.apache.org/incubator/BeamProposal
> 
> Regarding all the great feedbacks we received on the mailing list, we think 
> it's time to call a vote to accept Beam into the Incubator.
> 
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
> 
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
> 
> == Abstract ==
> 
> Apache Beam is an open source, unified model and set of language-specific 
> SDKs for defining and executing data processing workflows, and also data 
> ingestion and integration flows, supporting Enterprise Integration Patterns 
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the 
> mechanics of large-scale batch and streaming data processing and can run on a 
> number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow 
> (a cloud service). Beam also brings DSL in different languages, allowing 
> users to easily implement their data integration processes.
> 
> == Proposal ==
> 
> Beam is a simple, flexible, and powerful system for distributed data 
> processing at any scale. Beam provides a unified programming model, a 
> software development kit to define and construct data processing pipelines, 
> and runners to execute Beam pipelines in several runtime engines, like Apache 
> Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety 
> of streaming or batch data processing goals including ETL, stream analysis, 
> and aggregate computation. The underlying programming model for Beam provides 
> MapReduce-like parallelism, combined with support for powerful data 
> windowing, and fine-grained correctness control.
> 
> == Background ==
> 
> Beam started as a set of Google projects (Google Cloud Dataflow) focused on 
> making data processing easier, faster, and less costly. The Beam model is a 
> successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused 
> on providing a unified solution for batch and stream processing. These 
> projects on which Beam is based have been published in several papers made 
> available to the public:
> 
> * MapReduce - http://research.google.com/archive/mapreduce.html
> * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> * FlumeJava - http://research.google.com/pubs/pub35650.html
> * MillWheel - http://research.google.com/pubs/pub41378.html
> 
> Beam was designed from the start to provide a portable programming layer. 
> When you define a data processing pipeline with the Beam model, you are 
> creating a job which is capable of being processed by any number of Beam 
> processing engines. Several engines have been developed to run Beam pipelines 
> in other open source runtimes, including a Beam runner for Apache Flink and 
> Apache Spark. There is also a “direct runner”, for execution on the developer 
> machine (mainly for dev/debug purposes). Another runner allows a Beam program 
> to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. 
> The Dataflow Java SDK is already available on GitHub, and independent from 
> the Google Cloud Dataflow service. Another Python SDK is currently in active 
> development.
> 
> In this proposal, the Beam SDKs, model, and a set of runners will be 
> submitted as an OSS project under the ASF. The runners which are a part of 
> this proposal include those for Spark (from Cloudera), Flink (from data 
> Artisans), and local development (from Google); the Google Cloud Dataflow 
> service runner is not included in this proposal. Further references to Beam 
> will refer to the Dataflow model, SDKs, and runners which are a part of this 
> proposal (Apache Beam) only. The initial submission will contain the 
> already-released Java SDK; Google intends to submit the Python SDK later in 
> the incubation process. The Google Cloud Dataflow service will continue to be 
> one of many runners for Beam, built on Google Cloud Platform, to run Beam 
> pipelines. Necessarily, Cloud Dataflow will develop against the Apache 
> project additions, updates, and changes. Google Cloud Dataflow will become 
> one user of Apache Beam and will participate in the project openly and 
> publicly.
> 
> The Beam programming model has been designed with simplicity, scalability, 
> and speed as key tenants. In the Beam model, you only need to think about 
> four top-level concepts when constructing your data processing job:
> 
> * Pipelines - The data processing job ma

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Henry Saputra
+1 (binding)

On Thursday, January 28, 2016, Jean-Baptiste Onofré  wrote:

> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> the mechanics of large-scale batch and streaming data processing and can
> run on a number of runtimes like Apache Flink, Apache Spark, and Google
> Cloud Dataflow (a cloud service). Beam also brings DSL in different
> languages, allowing users to easily implement their data integration
> processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
> a variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for
> powerful data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam model
> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner
> allows a Beam program to run on a managed service, Google Cloud Dataflow,
> in Google Cloud Platform. The Dataflow Java SDK is already available on
> GitHub, and independent from the Google Cloud Dataflow service. Another
> Python SDK is currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of
> this proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series of computations
> including input, processing, and o

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Aljoscha Krettek
+1 (non-binding)

> On 28 Jan 2016, at 15:59, Alexander Bezzubov  wrote:
> 
> +1 (non-binding)
> 
> On Thu, Jan 28, 2016 at 3:54 PM, Joe Witt  wrote:
> 
>> +1 (non-binding)
>> 
>> On Thu, Jan 28, 2016 at 9:48 AM, Sergio Fernández 
>> wrote:
>>> +1 (binding)
>>> 
>>> On Thu, Jan 28, 2016 at 3:28 PM, Jean-Baptiste Onofré 
>>> wrote:
>>> 
 Hi,
 
 the Beam proposal (initially Dataflow) was proposed last week.
 
 The complete discussion thread is available here:
 
 
 
>> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
 
 As reminder the BeamProposal is here:
 
 https://wiki.apache.org/incubator/BeamProposal
 
 Regarding all the great feedbacks we received on the mailing list, we
 think it's time to call a vote to accept Beam into the Incubator.
 
 Please cast your vote to:
 [] +1 - accept Apache Beam as a new incubating project
 []  0 - not sure
 [] -1 - do not accept the Apache Beam project (because: ...)
 
 Thanks,
 Regards
 JB
 
 ## page was renamed from DataflowProposal
 = Apache Beam =
 
 == Abstract ==
 
 Apache Beam is an open source, unified model and set of
>> language-specific
 SDKs for defining and executing data processing workflows, and also data
 ingestion and integration flows, supporting Enterprise Integration
>> Patterns
 (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
 the mechanics of large-scale batch and streaming data processing and can
 run on a number of runtimes like Apache Flink, Apache Spark, and Google
 Cloud Dataflow (a cloud service). Beam also brings DSL in different
 languages, allowing users to easily implement their data integration
 processes.
 
 == Proposal ==
 
 Beam is a simple, flexible, and powerful system for distributed data
 processing at any scale. Beam provides a unified programming model, a
 software development kit to define and construct data processing
>> pipelines,
 and runners to execute Beam pipelines in several runtime engines, like
 Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used
>> for
 a variety of streaming or batch data processing goals including ETL,
>> stream
 analysis, and aggregate computation. The underlying programming model
>> for
 Beam provides MapReduce-like parallelism, combined with support for
 powerful data windowing, and fine-grained correctness control.
 
 == Background ==
 
 Beam started as a set of Google projects (Google Cloud Dataflow) focused
 on making data processing easier, faster, and less costly. The Beam
>> model
 is a successor to MapReduce, FlumeJava, and Millwheel inside Google and
>> is
 focused on providing a unified solution for batch and stream processing.
 These projects on which Beam is based have been published in several
>> papers
 made available to the public:
 
 * MapReduce - http://research.google.com/archive/mapreduce.html
 * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
 * FlumeJava - http://research.google.com/pubs/pub35650.html
 * MillWheel - http://research.google.com/pubs/pub41378.html
 
 Beam was designed from the start to provide a portable programming
>> layer.
 When you define a data processing pipeline with the Beam model, you are
 creating a job which is capable of being processed by any number of Beam
 processing engines. Several engines have been developed to run Beam
 pipelines in other open source runtimes, including a Beam runner for
>> Apache
 Flink and Apache Spark. There is also a “direct runner”, for execution
>> on
 the developer machine (mainly for dev/debug purposes). Another runner
 allows a Beam program to run on a managed service, Google Cloud
>> Dataflow,
 in Google Cloud Platform. The Dataflow Java SDK is already available on
 GitHub, and independent from the Google Cloud Dataflow service. Another
 Python SDK is currently in active development.
 
 In this proposal, the Beam SDKs, model, and a set of runners will be
 submitted as an OSS project under the ASF. The runners which are a part
>> of
 this proposal include those for Spark (from Cloudera), Flink (from data
 Artisans), and local development (from Google); the Google Cloud
>> Dataflow
 service runner is not included in this proposal. Further references to
>> Beam
 will refer to the Dataflow model, SDKs, and runners which are a part of
 this proposal (Apache Beam) only. The initial submission will contain
>> the
 already-released Java SDK; Google intends to submit the Python SDK
>> later in
 the incubation process. The Google Cloud Dataflow service will continue
>> to
 be one of many runners for Beam, built on Googl

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Alexander Bezzubov
+1 (non-binding)

On Thu, Jan 28, 2016 at 3:54 PM, Joe Witt  wrote:

> +1 (non-binding)
>
> On Thu, Jan 28, 2016 at 9:48 AM, Sergio Fernández 
> wrote:
> > +1 (binding)
> >
> > On Thu, Jan 28, 2016 at 3:28 PM, Jean-Baptiste Onofré 
> > wrote:
> >
> >> Hi,
> >>
> >> the Beam proposal (initially Dataflow) was proposed last week.
> >>
> >> The complete discussion thread is available here:
> >>
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
> >>
> >> As reminder the BeamProposal is here:
> >>
> >> https://wiki.apache.org/incubator/BeamProposal
> >>
> >> Regarding all the great feedbacks we received on the mailing list, we
> >> think it's time to call a vote to accept Beam into the Incubator.
> >>
> >> Please cast your vote to:
> >> [] +1 - accept Apache Beam as a new incubating project
> >> []  0 - not sure
> >> [] -1 - do not accept the Apache Beam project (because: ...)
> >>
> >> Thanks,
> >> Regards
> >> JB
> >> 
> >> ## page was renamed from DataflowProposal
> >> = Apache Beam =
> >>
> >> == Abstract ==
> >>
> >> Apache Beam is an open source, unified model and set of
> language-specific
> >> SDKs for defining and executing data processing workflows, and also data
> >> ingestion and integration flows, supporting Enterprise Integration
> Patterns
> >> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> >> the mechanics of large-scale batch and streaming data processing and can
> >> run on a number of runtimes like Apache Flink, Apache Spark, and Google
> >> Cloud Dataflow (a cloud service). Beam also brings DSL in different
> >> languages, allowing users to easily implement their data integration
> >> processes.
> >>
> >> == Proposal ==
> >>
> >> Beam is a simple, flexible, and powerful system for distributed data
> >> processing at any scale. Beam provides a unified programming model, a
> >> software development kit to define and construct data processing
> pipelines,
> >> and runners to execute Beam pipelines in several runtime engines, like
> >> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used
> for
> >> a variety of streaming or batch data processing goals including ETL,
> stream
> >> analysis, and aggregate computation. The underlying programming model
> for
> >> Beam provides MapReduce-like parallelism, combined with support for
> >> powerful data windowing, and fine-grained correctness control.
> >>
> >> == Background ==
> >>
> >> Beam started as a set of Google projects (Google Cloud Dataflow) focused
> >> on making data processing easier, faster, and less costly. The Beam
> model
> >> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and
> is
> >> focused on providing a unified solution for batch and stream processing.
> >> These projects on which Beam is based have been published in several
> papers
> >> made available to the public:
> >>
> >>  * MapReduce - http://research.google.com/archive/mapreduce.html
> >>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
> >>  * FlumeJava - http://research.google.com/pubs/pub35650.html
> >>  * MillWheel - http://research.google.com/pubs/pub41378.html
> >>
> >> Beam was designed from the start to provide a portable programming
> layer.
> >> When you define a data processing pipeline with the Beam model, you are
> >> creating a job which is capable of being processed by any number of Beam
> >> processing engines. Several engines have been developed to run Beam
> >> pipelines in other open source runtimes, including a Beam runner for
> Apache
> >> Flink and Apache Spark. There is also a “direct runner”, for execution
> on
> >> the developer machine (mainly for dev/debug purposes). Another runner
> >> allows a Beam program to run on a managed service, Google Cloud
> Dataflow,
> >> in Google Cloud Platform. The Dataflow Java SDK is already available on
> >> GitHub, and independent from the Google Cloud Dataflow service. Another
> >> Python SDK is currently in active development.
> >>
> >> In this proposal, the Beam SDKs, model, and a set of runners will be
> >> submitted as an OSS project under the ASF. The runners which are a part
> of
> >> this proposal include those for Spark (from Cloudera), Flink (from data
> >> Artisans), and local development (from Google); the Google Cloud
> Dataflow
> >> service runner is not included in this proposal. Further references to
> Beam
> >> will refer to the Dataflow model, SDKs, and runners which are a part of
> >> this proposal (Apache Beam) only. The initial submission will contain
> the
> >> already-released Java SDK; Google intends to submit the Python SDK
> later in
> >> the incubation process. The Google Cloud Dataflow service will continue
> to
> >> be one of many runners for Beam, built on Google Cloud Platform, to run
> >> Beam pipelines. Necessarily, Cloud Dataflow will develop against the
> Apache
> >> project addit

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Joe Witt
+1 (non-binding)

On Thu, Jan 28, 2016 at 9:48 AM, Sergio Fernández  wrote:
> +1 (binding)
>
> On Thu, Jan 28, 2016 at 3:28 PM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi,
>>
>> the Beam proposal (initially Dataflow) was proposed last week.
>>
>> The complete discussion thread is available here:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>>
>> As reminder the BeamProposal is here:
>>
>> https://wiki.apache.org/incubator/BeamProposal
>>
>> Regarding all the great feedbacks we received on the mailing list, we
>> think it's time to call a vote to accept Beam into the Incubator.
>>
>> Please cast your vote to:
>> [] +1 - accept Apache Beam as a new incubating project
>> []  0 - not sure
>> [] -1 - do not accept the Apache Beam project (because: ...)
>>
>> Thanks,
>> Regards
>> JB
>> 
>> ## page was renamed from DataflowProposal
>> = Apache Beam =
>>
>> == Abstract ==
>>
>> Apache Beam is an open source, unified model and set of language-specific
>> SDKs for defining and executing data processing workflows, and also data
>> ingestion and integration flows, supporting Enterprise Integration Patterns
>> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
>> the mechanics of large-scale batch and streaming data processing and can
>> run on a number of runtimes like Apache Flink, Apache Spark, and Google
>> Cloud Dataflow (a cloud service). Beam also brings DSL in different
>> languages, allowing users to easily implement their data integration
>> processes.
>>
>> == Proposal ==
>>
>> Beam is a simple, flexible, and powerful system for distributed data
>> processing at any scale. Beam provides a unified programming model, a
>> software development kit to define and construct data processing pipelines,
>> and runners to execute Beam pipelines in several runtime engines, like
>> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
>> a variety of streaming or batch data processing goals including ETL, stream
>> analysis, and aggregate computation. The underlying programming model for
>> Beam provides MapReduce-like parallelism, combined with support for
>> powerful data windowing, and fine-grained correctness control.
>>
>> == Background ==
>>
>> Beam started as a set of Google projects (Google Cloud Dataflow) focused
>> on making data processing easier, faster, and less costly. The Beam model
>> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
>> focused on providing a unified solution for batch and stream processing.
>> These projects on which Beam is based have been published in several papers
>> made available to the public:
>>
>>  * MapReduce - http://research.google.com/archive/mapreduce.html
>>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>>  * MillWheel - http://research.google.com/pubs/pub41378.html
>>
>> Beam was designed from the start to provide a portable programming layer.
>> When you define a data processing pipeline with the Beam model, you are
>> creating a job which is capable of being processed by any number of Beam
>> processing engines. Several engines have been developed to run Beam
>> pipelines in other open source runtimes, including a Beam runner for Apache
>> Flink and Apache Spark. There is also a “direct runner”, for execution on
>> the developer machine (mainly for dev/debug purposes). Another runner
>> allows a Beam program to run on a managed service, Google Cloud Dataflow,
>> in Google Cloud Platform. The Dataflow Java SDK is already available on
>> GitHub, and independent from the Google Cloud Dataflow service. Another
>> Python SDK is currently in active development.
>>
>> In this proposal, the Beam SDKs, model, and a set of runners will be
>> submitted as an OSS project under the ASF. The runners which are a part of
>> this proposal include those for Spark (from Cloudera), Flink (from data
>> Artisans), and local development (from Google); the Google Cloud Dataflow
>> service runner is not included in this proposal. Further references to Beam
>> will refer to the Dataflow model, SDKs, and runners which are a part of
>> this proposal (Apache Beam) only. The initial submission will contain the
>> already-released Java SDK; Google intends to submit the Python SDK later in
>> the incubation process. The Google Cloud Dataflow service will continue to
>> be one of many runners for Beam, built on Google Cloud Platform, to run
>> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
>> project additions, updates, and changes. Google Cloud Dataflow will become
>> one user of Apache Beam and will participate in the project openly and
>> publicly.
>>
>> The Beam programming model has been designed with simplicity, scalability,
>> and speed as key tenants. In the Beam model, you only need to think about
>

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Sergio Fernández
+1 (binding)

On Thu, Jan 28, 2016 at 3:28 PM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> the Beam proposal (initially Dataflow) was proposed last week.
>
> The complete discussion thread is available here:
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E
>
> As reminder the BeamProposal is here:
>
> https://wiki.apache.org/incubator/BeamProposal
>
> Regarding all the great feedbacks we received on the mailing list, we
> think it's time to call a vote to accept Beam into the Incubator.
>
> Please cast your vote to:
> [] +1 - accept Apache Beam as a new incubating project
> []  0 - not sure
> [] -1 - do not accept the Apache Beam project (because: ...)
>
> Thanks,
> Regards
> JB
> 
> ## page was renamed from DataflowProposal
> = Apache Beam =
>
> == Abstract ==
>
> Apache Beam is an open source, unified model and set of language-specific
> SDKs for defining and executing data processing workflows, and also data
> ingestion and integration flows, supporting Enterprise Integration Patterns
> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify
> the mechanics of large-scale batch and streaming data processing and can
> run on a number of runtimes like Apache Flink, Apache Spark, and Google
> Cloud Dataflow (a cloud service). Beam also brings DSL in different
> languages, allowing users to easily implement their data integration
> processes.
>
> == Proposal ==
>
> Beam is a simple, flexible, and powerful system for distributed data
> processing at any scale. Beam provides a unified programming model, a
> software development kit to define and construct data processing pipelines,
> and runners to execute Beam pipelines in several runtime engines, like
> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for
> a variety of streaming or batch data processing goals including ETL, stream
> analysis, and aggregate computation. The underlying programming model for
> Beam provides MapReduce-like parallelism, combined with support for
> powerful data windowing, and fine-grained correctness control.
>
> == Background ==
>
> Beam started as a set of Google projects (Google Cloud Dataflow) focused
> on making data processing easier, faster, and less costly. The Beam model
> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is
> focused on providing a unified solution for batch and stream processing.
> These projects on which Beam is based have been published in several papers
> made available to the public:
>
>  * MapReduce - http://research.google.com/archive/mapreduce.html
>  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>  * MillWheel - http://research.google.com/pubs/pub41378.html
>
> Beam was designed from the start to provide a portable programming layer.
> When you define a data processing pipeline with the Beam model, you are
> creating a job which is capable of being processed by any number of Beam
> processing engines. Several engines have been developed to run Beam
> pipelines in other open source runtimes, including a Beam runner for Apache
> Flink and Apache Spark. There is also a “direct runner”, for execution on
> the developer machine (mainly for dev/debug purposes). Another runner
> allows a Beam program to run on a managed service, Google Cloud Dataflow,
> in Google Cloud Platform. The Dataflow Java SDK is already available on
> GitHub, and independent from the Google Cloud Dataflow service. Another
> Python SDK is currently in active development.
>
> In this proposal, the Beam SDKs, model, and a set of runners will be
> submitted as an OSS project under the ASF. The runners which are a part of
> this proposal include those for Spark (from Cloudera), Flink (from data
> Artisans), and local development (from Google); the Google Cloud Dataflow
> service runner is not included in this proposal. Further references to Beam
> will refer to the Dataflow model, SDKs, and runners which are a part of
> this proposal (Apache Beam) only. The initial submission will contain the
> already-released Java SDK; Google intends to submit the Python SDK later in
> the incubation process. The Google Cloud Dataflow service will continue to
> be one of many runners for Beam, built on Google Cloud Platform, to run
> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache
> project additions, updates, and changes. Google Cloud Dataflow will become
> one user of Apache Beam and will participate in the project openly and
> publicly.
>
> The Beam programming model has been designed with simplicity, scalability,
> and speed as key tenants. In the Beam model, you only need to think about
> four top-level concepts when constructing your data processing job:
>
>  * Pipelines - The data processing job made of a series of computations
> including input, processing, and

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Bertrand Delacretaz
> [X] +1 - accept Apache Beam as a new incubating project

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Jean-Baptiste Onofré

Here's my +1 of course.

Regards
JB

On 01/28/2016 03:28 PM, Jean-Baptiste Onofré wrote:

Hi,

the Beam proposal (initially Dataflow) was proposed last week.

The complete discussion thread is available here:

http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E


As reminder the BeamProposal is here:

https://wiki.apache.org/incubator/BeamProposal

Regarding all the great feedbacks we received on the mailing list, we
think it's time to call a vote to accept Beam into the Incubator.

Please cast your vote to:
[] +1 - accept Apache Beam as a new incubating project
[]  0 - not sure
[] -1 - do not accept the Apache Beam project (because: ...)

Thanks,
Regards
JB

## page was renamed from DataflowProposal
= Apache Beam =

== Abstract ==

Apache Beam is an open source, unified model and set of
language-specific SDKs for defining and executing data processing
workflows, and also data ingestion and integration flows, supporting
Enterprise Integration Patterns (EIPs) and Domain Specific Languages
(DSLs). Dataflow pipelines simplify the mechanics of large-scale batch
and streaming data processing and can run on a number of runtimes like
Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
Beam also brings DSL in different languages, allowing users to easily
implement their data integration processes.

== Proposal ==

Beam is a simple, flexible, and powerful system for distributed data
processing at any scale. Beam provides a unified programming model, a
software development kit to define and construct data processing
pipelines, and runners to execute Beam pipelines in several runtime
engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam
can be used for a variety of streaming or batch data processing goals
including ETL, stream analysis, and aggregate computation. The
underlying programming model for Beam provides MapReduce-like
parallelism, combined with support for powerful data windowing, and
fine-grained correctness control.

== Background ==

Beam started as a set of Google projects (Google Cloud Dataflow) focused
on making data processing easier, faster, and less costly. The Beam
model is a successor to MapReduce, FlumeJava, and Millwheel inside
Google and is focused on providing a unified solution for batch and
stream processing. These projects on which Beam is based have been
published in several papers made available to the public:

  * MapReduce - http://research.google.com/archive/mapreduce.html
  * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
  * FlumeJava - http://research.google.com/pubs/pub35650.html
  * MillWheel - http://research.google.com/pubs/pub41378.html

Beam was designed from the start to provide a portable programming
layer. When you define a data processing pipeline with the Beam model,
you are creating a job which is capable of being processed by any number
of Beam processing engines. Several engines have been developed to run
Beam pipelines in other open source runtimes, including a Beam runner
for Apache Flink and Apache Spark. There is also a “direct runner”, for
execution on the developer machine (mainly for dev/debug purposes).
Another runner allows a Beam program to run on a managed service, Google
Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
already available on GitHub, and independent from the Google Cloud
Dataflow service. Another Python SDK is currently in active development.

In this proposal, the Beam SDKs, model, and a set of runners will be
submitted as an OSS project under the ASF. The runners which are a part
of this proposal include those for Spark (from Cloudera), Flink (from
data Artisans), and local development (from Google); the Google Cloud
Dataflow service runner is not included in this proposal. Further
references to Beam will refer to the Dataflow model, SDKs, and runners
which are a part of this proposal (Apache Beam) only. The initial
submission will contain the already-released Java SDK; Google intends to
submit the Python SDK later in the incubation process. The Google Cloud
Dataflow service will continue to be one of many runners for Beam, built
on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud
Dataflow will develop against the Apache project additions, updates, and
changes. Google Cloud Dataflow will become one user of Apache Beam and
will participate in the project openly and publicly.

The Beam programming model has been designed with simplicity,
scalability, and speed as key tenants. In the Beam model, you only need
to think about four top-level concepts when constructing your data
processing job:

  * Pipelines - The data processing job made of a series of computations
including input, processing, and output
  * PCollections - Bounded (or unbounded) datasets which represent the
input, intermediate and output data in pipelines
  * PTransforms - A data proces

[VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Jean-Baptiste Onofré

Hi,

the Beam proposal (initially Dataflow) was proposed last week.

The complete discussion thread is available here:

http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E

As reminder the BeamProposal is here:

https://wiki.apache.org/incubator/BeamProposal

Regarding all the great feedbacks we received on the mailing list, we 
think it's time to call a vote to accept Beam into the Incubator.


Please cast your vote to:
[] +1 - accept Apache Beam as a new incubating project
[]  0 - not sure
[] -1 - do not accept the Apache Beam project (because: ...)

Thanks,
Regards
JB

## page was renamed from DataflowProposal
= Apache Beam =

== Abstract ==

Apache Beam is an open source, unified model and set of 
language-specific SDKs for defining and executing data processing 
workflows, and also data ingestion and integration flows, supporting 
Enterprise Integration Patterns (EIPs) and Domain Specific Languages 
(DSLs). Dataflow pipelines simplify the mechanics of large-scale batch 
and streaming data processing and can run on a number of runtimes like 
Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). 
Beam also brings DSL in different languages, allowing users to easily 
implement their data integration processes.


== Proposal ==

Beam is a simple, flexible, and powerful system for distributed data 
processing at any scale. Beam provides a unified programming model, a 
software development kit to define and construct data processing 
pipelines, and runners to execute Beam pipelines in several runtime 
engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam 
can be used for a variety of streaming or batch data processing goals 
including ETL, stream analysis, and aggregate computation. The 
underlying programming model for Beam provides MapReduce-like 
parallelism, combined with support for powerful data windowing, and 
fine-grained correctness control.


== Background ==

Beam started as a set of Google projects (Google Cloud Dataflow) focused 
on making data processing easier, faster, and less costly. The Beam 
model is a successor to MapReduce, FlumeJava, and Millwheel inside 
Google and is focused on providing a unified solution for batch and 
stream processing. These projects on which Beam is based have been 
published in several papers made available to the public:


 * MapReduce - http://research.google.com/archive/mapreduce.html
 * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
 * FlumeJava - http://research.google.com/pubs/pub35650.html
 * MillWheel - http://research.google.com/pubs/pub41378.html

Beam was designed from the start to provide a portable programming 
layer. When you define a data processing pipeline with the Beam model, 
you are creating a job which is capable of being processed by any number 
of Beam processing engines. Several engines have been developed to run 
Beam pipelines in other open source runtimes, including a Beam runner 
for Apache Flink and Apache Spark. There is also a “direct runner”, for 
execution on the developer machine (mainly for dev/debug purposes). 
Another runner allows a Beam program to run on a managed service, Google 
Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is 
already available on GitHub, and independent from the Google Cloud 
Dataflow service. Another Python SDK is currently in active development.


In this proposal, the Beam SDKs, model, and a set of runners will be 
submitted as an OSS project under the ASF. The runners which are a part 
of this proposal include those for Spark (from Cloudera), Flink (from 
data Artisans), and local development (from Google); the Google Cloud 
Dataflow service runner is not included in this proposal. Further 
references to Beam will refer to the Dataflow model, SDKs, and runners 
which are a part of this proposal (Apache Beam) only. The initial 
submission will contain the already-released Java SDK; Google intends to 
submit the Python SDK later in the incubation process. The Google Cloud 
Dataflow service will continue to be one of many runners for Beam, built 
on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud 
Dataflow will develop against the Apache project additions, updates, and 
changes. Google Cloud Dataflow will become one user of Apache Beam and 
will participate in the project openly and publicly.


The Beam programming model has been designed with simplicity, 
scalability, and speed as key tenants. In the Beam model, you only need 
to think about four top-level concepts when constructing your data 
processing job:


 * Pipelines - The data processing job made of a series of computations 
including input, processing, and output
 * PCollections - Bounded (or unbounded) datasets which represent the 
input, intermediate and output data in pipelines
 * PTransforms - A data processing step in a pipeline in which one or