[RESULT][VOTE] Accept Beam into the Apache Incubator
Hi all, this vote passed with the following result: +1 (binding): Jean-Baptiste Onofré, Bertrand Delacretaz, Sergio Fernandez, Henry Saputra, Taylor Goetz, Jim Jagielski, Suresh Marru, Daniel Kulp, Chris Nauroth, James Taylor, Greg Stein, John D. Ament, Ted Dunning, Venkatesh Seetharam, Julian Hyde, Edward J. Yoon, Hadrian Zbarcea, Amareshwari Sriramdasu, Olivier Lamy, Tom White, Tom Barber +1 (non binding): Joe Witt, Alexander Bezzubov, Aljoscha Krettek, Krzysztof Sobkowiak, Felix Cheung, Supun Kamburugamuva, Prasanth Jayachandran, Ashish, Markus Geiss, Andreas Neumann, Mayank Bansal, Seshu Adunuthula, Byung-Gon Chun, Gregory Chase, Li Yang, Philip Rhodes, Naresh Agarwal, Johan Edstrom, Tsuyoshi Ozawa, Hao Chen, Renaud Richardet, Luke Han, Libin Sun 0: -1: It means 44 +1 (21 binding, 23 non binding), no 0, no -1. Congrats to the Beam (aka dataflow) community and welcome to the ASF ! I will now work with infra to get the resources for the project. Thanks all for your vote. Regards JB On 01/28/2016 03:28 PM, Jean-Baptiste Onofré wrote: Hi, the Beam proposal (initially Dataflow) was proposed last week. The complete discussion thread is available here: http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E As reminder the BeamProposal is here: https://wiki.apache.org/incubator/BeamProposal Regarding all the great feedbacks we received on the mailing list, we think it's time to call a vote to accept Beam into the Incubator. Please cast your vote to: [] +1 - accept Apache Beam as a new incubating project [] 0 - not sure [] -1 - do not accept the Apache Beam project (because: ...) Thanks, Regards JB ## page was renamed from DataflowProposal = Apache Beam = == Abstract == Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes. == Proposal == Beam is a simple, flexible, and powerful system for distributed data processing at any scale. Beam provides a unified programming model, a software development kit to define and construct data processing pipelines, and runners to execute Beam pipelines in several runtime engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety of streaming or batch data processing goals including ETL, stream analysis, and aggregate computation. The underlying programming model for Beam provides MapReduce-like parallelism, combined with support for powerful data windowing, and fine-grained correctness control. == Background == Beam started as a set of Google projects (Google Cloud Dataflow) focused on making data processing easier, faster, and less costly. The Beam model is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused on providing a unified solution for batch and stream processing. These projects on which Beam is based have been published in several papers made available to the public: * MapReduce - http://research.google.com/archive/mapreduce.html * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf * FlumeJava - http://research.google.com/pubs/pub35650.html * MillWheel - http://research.google.com/pubs/pub41378.html Beam was designed from the start to provide a portable programming layer. When you define a data processing pipeline with the Beam model, you are creating a job which is capable of being processed by any number of Beam processing engines. Several engines have been developed to run Beam pipelines in other open source runtimes, including a Beam runner for Apache Flink and Apache Spark. There is also a “direct runner”, for execution on the developer machine (mainly for dev/debug purposes). Another runner allows a Beam program to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is already available on GitHub, and independent from the Google Cloud Dataflow service. Another Python SDK is currently in active development. In this proposal, the Beam SDKs, model, and a set of runners will be submitted as an OSS project under the ASF. The runners which are a part of this proposal include those for Spark (from Cloudera), Flink (from data Artisans), and local development (from Google); the Google Cloud Dataflow service runner is not included in this proposal. Further references to Beam will refer to the Dataflow model, SDKs, and runners which are a part of this proposal (Apache Beam) only.
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) 2016-01-29 23:51 GMT+08:00 Luke Han : > +1 (non-binding) > > On Fri, Jan 29, 2016 at 6:31 PM, Tom Barber > wrote: > > > +1 (binding) > > > > On Fri, Jan 29, 2016 at 10:03 AM, Tom White > wrote: > > > > > +1 (binding) > > > > > > Tom > > > > > > On Thu, Jan 28, 2016 at 2:28 PM, Jean-Baptiste Onofré > > > > wrote: > > > > Hi, > > > > > > > > the Beam proposal (initially Dataflow) was proposed last week. > > > > > > > > The complete discussion thread is available here: > > > > > > > > > > > > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > > > > > > > As reminder the BeamProposal is here: > > > > > > > > https://wiki.apache.org/incubator/BeamProposal > > > > > > > > Regarding all the great feedbacks we received on the mailing list, we > > > think > > > > it's time to call a vote to accept Beam into the Incubator. > > > > > > > > Please cast your vote to: > > > > [] +1 - accept Apache Beam as a new incubating project > > > > [] 0 - not sure > > > > [] -1 - do not accept the Apache Beam project (because: ...) > > > > > > > > Thanks, > > > > Regards > > > > JB > > > > > > > > ## page was renamed from DataflowProposal > > > > = Apache Beam = > > > > > > > > == Abstract == > > > > > > > > Apache Beam is an open source, unified model and set of > > language-specific > > > > SDKs for defining and executing data processing workflows, and also > > data > > > > ingestion and integration flows, supporting Enterprise Integration > > > Patterns > > > > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines > > simplify > > > the > > > > mechanics of large-scale batch and streaming data processing and can > > run > > > on > > > > a number of runtimes like Apache Flink, Apache Spark, and Google > Cloud > > > > Dataflow (a cloud service). Beam also brings DSL in different > > languages, > > > > allowing users to easily implement their data integration processes. > > > > > > > > == Proposal == > > > > > > > > Beam is a simple, flexible, and powerful system for distributed data > > > > processing at any scale. Beam provides a unified programming model, a > > > > software development kit to define and construct data processing > > > pipelines, > > > > and runners to execute Beam pipelines in several runtime engines, > like > > > > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be > used > > > for a > > > > variety of streaming or batch data processing goals including ETL, > > stream > > > > analysis, and aggregate computation. The underlying programming model > > for > > > > Beam provides MapReduce-like parallelism, combined with support for > > > powerful > > > > data windowing, and fine-grained correctness control. > > > > > > > > == Background == > > > > > > > > Beam started as a set of Google projects (Google Cloud Dataflow) > > focused > > > on > > > > making data processing easier, faster, and less costly. The Beam > model > > > is a > > > > successor to MapReduce, FlumeJava, and Millwheel inside Google and is > > > > focused on providing a unified solution for batch and stream > > processing. > > > > These projects on which Beam is based have been published in several > > > papers > > > > made available to the public: > > > > > > > > * MapReduce - http://research.google.com/archive/mapreduce.html > > > > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > > > * FlumeJava - http://research.google.com/pubs/pub35650.html > > > > * MillWheel - http://research.google.com/pubs/pub41378.html > > > > > > > > Beam was designed from the start to provide a portable programming > > layer. > > > > When you define a data processing pipeline with the Beam model, you > are > > > > creating a job which is capable of being processed by any number of > > Beam > > > > processing engines. Several engines have been developed to run Beam > > > > pipelines in other open source runtimes, including a Beam runner for > > > Apache > > > > Flink and Apache Spark. There is also a “direct runner”, for > execution > > on > > > > the developer machine (mainly for dev/debug purposes). Another runner > > > allows > > > > a Beam program to run on a managed service, Google Cloud Dataflow, in > > > Google > > > > Cloud Platform. The Dataflow Java SDK is already available on GitHub, > > and > > > > independent from the Google Cloud Dataflow service. Another Python > SDK > > is > > > > currently in active development. > > > > > > > > In this proposal, the Beam SDKs, model, and a set of runners will be > > > > submitted as an OSS project under the ASF. The runners which are a > part > > > of > > > > this proposal include those for Spark (from Cloudera), Flink (from > data > > > > Artisans), and local development (from Google); the Google Cloud > > Dataflow > > > > service runner is not included in this proposal. Further references > to > > > Beam > > > > will refer to the Dataf
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On Fri, Jan 29, 2016 at 6:31 PM, Tom Barber wrote: > +1 (binding) > > On Fri, Jan 29, 2016 at 10:03 AM, Tom White wrote: > > > +1 (binding) > > > > Tom > > > > On Thu, Jan 28, 2016 at 2:28 PM, Jean-Baptiste Onofré > > wrote: > > > Hi, > > > > > > the Beam proposal (initially Dataflow) was proposed last week. > > > > > > The complete discussion thread is available here: > > > > > > > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > > > > > As reminder the BeamProposal is here: > > > > > > https://wiki.apache.org/incubator/BeamProposal > > > > > > Regarding all the great feedbacks we received on the mailing list, we > > think > > > it's time to call a vote to accept Beam into the Incubator. > > > > > > Please cast your vote to: > > > [] +1 - accept Apache Beam as a new incubating project > > > [] 0 - not sure > > > [] -1 - do not accept the Apache Beam project (because: ...) > > > > > > Thanks, > > > Regards > > > JB > > > > > > ## page was renamed from DataflowProposal > > > = Apache Beam = > > > > > > == Abstract == > > > > > > Apache Beam is an open source, unified model and set of > language-specific > > > SDKs for defining and executing data processing workflows, and also > data > > > ingestion and integration flows, supporting Enterprise Integration > > Patterns > > > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines > simplify > > the > > > mechanics of large-scale batch and streaming data processing and can > run > > on > > > a number of runtimes like Apache Flink, Apache Spark, and Google Cloud > > > Dataflow (a cloud service). Beam also brings DSL in different > languages, > > > allowing users to easily implement their data integration processes. > > > > > > == Proposal == > > > > > > Beam is a simple, flexible, and powerful system for distributed data > > > processing at any scale. Beam provides a unified programming model, a > > > software development kit to define and construct data processing > > pipelines, > > > and runners to execute Beam pipelines in several runtime engines, like > > > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used > > for a > > > variety of streaming or batch data processing goals including ETL, > stream > > > analysis, and aggregate computation. The underlying programming model > for > > > Beam provides MapReduce-like parallelism, combined with support for > > powerful > > > data windowing, and fine-grained correctness control. > > > > > > == Background == > > > > > > Beam started as a set of Google projects (Google Cloud Dataflow) > focused > > on > > > making data processing easier, faster, and less costly. The Beam model > > is a > > > successor to MapReduce, FlumeJava, and Millwheel inside Google and is > > > focused on providing a unified solution for batch and stream > processing. > > > These projects on which Beam is based have been published in several > > papers > > > made available to the public: > > > > > > * MapReduce - http://research.google.com/archive/mapreduce.html > > > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > > * FlumeJava - http://research.google.com/pubs/pub35650.html > > > * MillWheel - http://research.google.com/pubs/pub41378.html > > > > > > Beam was designed from the start to provide a portable programming > layer. > > > When you define a data processing pipeline with the Beam model, you are > > > creating a job which is capable of being processed by any number of > Beam > > > processing engines. Several engines have been developed to run Beam > > > pipelines in other open source runtimes, including a Beam runner for > > Apache > > > Flink and Apache Spark. There is also a “direct runner”, for execution > on > > > the developer machine (mainly for dev/debug purposes). Another runner > > allows > > > a Beam program to run on a managed service, Google Cloud Dataflow, in > > Google > > > Cloud Platform. The Dataflow Java SDK is already available on GitHub, > and > > > independent from the Google Cloud Dataflow service. Another Python SDK > is > > > currently in active development. > > > > > > In this proposal, the Beam SDKs, model, and a set of runners will be > > > submitted as an OSS project under the ASF. The runners which are a part > > of > > > this proposal include those for Spark (from Cloudera), Flink (from data > > > Artisans), and local development (from Google); the Google Cloud > Dataflow > > > service runner is not included in this proposal. Further references to > > Beam > > > will refer to the Dataflow model, SDKs, and runners which are a part of > > this > > > proposal (Apache Beam) only. The initial submission will contain the > > > already-released Java SDK; Google intends to submit the Python SDK > later > > in > > > the incubation process. The Google Cloud Dataflow service will continue > > to > > > be one of many runners for
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) On Fri, Jan 29, 2016 at 10:03 AM, Tom White wrote: > +1 (binding) > > Tom > > On Thu, Jan 28, 2016 at 2:28 PM, Jean-Baptiste Onofré > wrote: > > Hi, > > > > the Beam proposal (initially Dataflow) was proposed last week. > > > > The complete discussion thread is available here: > > > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > > > As reminder the BeamProposal is here: > > > > https://wiki.apache.org/incubator/BeamProposal > > > > Regarding all the great feedbacks we received on the mailing list, we > think > > it's time to call a vote to accept Beam into the Incubator. > > > > Please cast your vote to: > > [] +1 - accept Apache Beam as a new incubating project > > [] 0 - not sure > > [] -1 - do not accept the Apache Beam project (because: ...) > > > > Thanks, > > Regards > > JB > > > > ## page was renamed from DataflowProposal > > = Apache Beam = > > > > == Abstract == > > > > Apache Beam is an open source, unified model and set of language-specific > > SDKs for defining and executing data processing workflows, and also data > > ingestion and integration flows, supporting Enterprise Integration > Patterns > > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > the > > mechanics of large-scale batch and streaming data processing and can run > on > > a number of runtimes like Apache Flink, Apache Spark, and Google Cloud > > Dataflow (a cloud service). Beam also brings DSL in different languages, > > allowing users to easily implement their data integration processes. > > > > == Proposal == > > > > Beam is a simple, flexible, and powerful system for distributed data > > processing at any scale. Beam provides a unified programming model, a > > software development kit to define and construct data processing > pipelines, > > and runners to execute Beam pipelines in several runtime engines, like > > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used > for a > > variety of streaming or batch data processing goals including ETL, stream > > analysis, and aggregate computation. The underlying programming model for > > Beam provides MapReduce-like parallelism, combined with support for > powerful > > data windowing, and fine-grained correctness control. > > > > == Background == > > > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on > > making data processing easier, faster, and less costly. The Beam model > is a > > successor to MapReduce, FlumeJava, and Millwheel inside Google and is > > focused on providing a unified solution for batch and stream processing. > > These projects on which Beam is based have been published in several > papers > > made available to the public: > > > > * MapReduce - http://research.google.com/archive/mapreduce.html > > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > * FlumeJava - http://research.google.com/pubs/pub35650.html > > * MillWheel - http://research.google.com/pubs/pub41378.html > > > > Beam was designed from the start to provide a portable programming layer. > > When you define a data processing pipeline with the Beam model, you are > > creating a job which is capable of being processed by any number of Beam > > processing engines. Several engines have been developed to run Beam > > pipelines in other open source runtimes, including a Beam runner for > Apache > > Flink and Apache Spark. There is also a “direct runner”, for execution on > > the developer machine (mainly for dev/debug purposes). Another runner > allows > > a Beam program to run on a managed service, Google Cloud Dataflow, in > Google > > Cloud Platform. The Dataflow Java SDK is already available on GitHub, and > > independent from the Google Cloud Dataflow service. Another Python SDK is > > currently in active development. > > > > In this proposal, the Beam SDKs, model, and a set of runners will be > > submitted as an OSS project under the ASF. The runners which are a part > of > > this proposal include those for Spark (from Cloudera), Flink (from data > > Artisans), and local development (from Google); the Google Cloud Dataflow > > service runner is not included in this proposal. Further references to > Beam > > will refer to the Dataflow model, SDKs, and runners which are a part of > this > > proposal (Apache Beam) only. The initial submission will contain the > > already-released Java SDK; Google intends to submit the Python SDK later > in > > the incubation process. The Google Cloud Dataflow service will continue > to > > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam > > pipelines. Necessarily, Cloud Dataflow will develop against the Apache > > project additions, updates, and changes. Google Cloud Dataflow will > become > > one user of Apache Beam and will participate in the project openly and > > publicly. > > > > The Beam programming model h
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) Tom On Thu, Jan 28, 2016 at 2:28 PM, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we think > it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the > mechanics of large-scale batch and streaming data processing and can run on > a number of runtimes like Apache Flink, Apache Spark, and Google Cloud > Dataflow (a cloud service). Beam also brings DSL in different languages, > allowing users to easily implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a > variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for powerful > data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused on > making data processing easier, faster, and less costly. The Beam model is a > successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner allows > a Beam program to run on a managed service, Google Cloud Dataflow, in Google > Cloud Platform. The Dataflow Java SDK is already available on GitHub, and > independent from the Google Cloud Dataflow service. Another Python SDK is > currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of this > proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run Beam > pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series of computations > including input, processing, and
Re: [VOTE] Accept Beam into the Apache Incubator
+1 Olivier On 29 January 2016 at 01:28, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > the mechanics of large-scale batch and streaming data processing and can > run on a number of runtimes like Apache Flink, Apache Spark, and Google > Cloud Dataflow (a cloud service). Beam also brings DSL in different > languages, allowing users to easily implement their data integration > processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for > a variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for > powerful data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam model > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner > allows a Beam program to run on a managed service, Google Cloud Dataflow, > in Google Cloud Platform. The Dataflow Java SDK is already available on > GitHub, and independent from the Google Cloud Dataflow service. Another > Python SDK is currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of > this proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series of computations > including input, processing, and outp
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On Fri, Jan 29, 2016 at 5:47 AM, Amareshwari Sriramdasu < amareshw...@apache.org> wrote: > +1 (Binding) > > On Thu, Jan 28, 2016 at 7:58 PM, Jean-Baptiste Onofré > wrote: > > > Hi, > > > > the Beam proposal (initially Dataflow) was proposed last week. > > > > The complete discussion thread is available here: > > > > > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > > > As reminder the BeamProposal is here: > > > > https://wiki.apache.org/incubator/BeamProposal > > > > Regarding all the great feedbacks we received on the mailing list, we > > think it's time to call a vote to accept Beam into the Incubator. > > > > Please cast your vote to: > > [] +1 - accept Apache Beam as a new incubating project > > [] 0 - not sure > > [] -1 - do not accept the Apache Beam project (because: ...) > > > > Thanks, > > Regards > > JB > > > > ## page was renamed from DataflowProposal > > = Apache Beam = > > > > == Abstract == > > > > Apache Beam is an open source, unified model and set of language-specific > > SDKs for defining and executing data processing workflows, and also data > > ingestion and integration flows, supporting Enterprise Integration > Patterns > > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > > the mechanics of large-scale batch and streaming data processing and can > > run on a number of runtimes like Apache Flink, Apache Spark, and Google > > Cloud Dataflow (a cloud service). Beam also brings DSL in different > > languages, allowing users to easily implement their data integration > > processes. > > > > == Proposal == > > > > Beam is a simple, flexible, and powerful system for distributed data > > processing at any scale. Beam provides a unified programming model, a > > software development kit to define and construct data processing > pipelines, > > and runners to execute Beam pipelines in several runtime engines, like > > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used > for > > a variety of streaming or batch data processing goals including ETL, > stream > > analysis, and aggregate computation. The underlying programming model for > > Beam provides MapReduce-like parallelism, combined with support for > > powerful data windowing, and fine-grained correctness control. > > > > == Background == > > > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > > on making data processing easier, faster, and less costly. The Beam model > > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and > is > > focused on providing a unified solution for batch and stream processing. > > These projects on which Beam is based have been published in several > papers > > made available to the public: > > > > * MapReduce - http://research.google.com/archive/mapreduce.html > > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > * FlumeJava - http://research.google.com/pubs/pub35650.html > > * MillWheel - http://research.google.com/pubs/pub41378.html > > > > Beam was designed from the start to provide a portable programming layer. > > When you define a data processing pipeline with the Beam model, you are > > creating a job which is capable of being processed by any number of Beam > > processing engines. Several engines have been developed to run Beam > > pipelines in other open source runtimes, including a Beam runner for > Apache > > Flink and Apache Spark. There is also a “direct runner”, for execution on > > the developer machine (mainly for dev/debug purposes). Another runner > > allows a Beam program to run on a managed service, Google Cloud Dataflow, > > in Google Cloud Platform. The Dataflow Java SDK is already available on > > GitHub, and independent from the Google Cloud Dataflow service. Another > > Python SDK is currently in active development. > > > > In this proposal, the Beam SDKs, model, and a set of runners will be > > submitted as an OSS project under the ASF. The runners which are a part > of > > this proposal include those for Spark (from Cloudera), Flink (from data > > Artisans), and local development (from Google); the Google Cloud Dataflow > > service runner is not included in this proposal. Further references to > Beam > > will refer to the Dataflow model, SDKs, and runners which are a part of > > this proposal (Apache Beam) only. The initial submission will contain the > > already-released Java SDK; Google intends to submit the Python SDK later > in > > the incubation process. The Google Cloud Dataflow service will continue > to > > be one of many runners for Beam, built on Google Cloud Platform, to run > > Beam pipelines. Necessarily, Cloud Dataflow will develop against the > Apache > > project additions, updates, and changes. Google Cloud Dataflow will > become > > one user of Apache Beam and will participate in the project openly and > > publicly. > > > >
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (Binding) On Thu, Jan 28, 2016 at 7:58 PM, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > the mechanics of large-scale batch and streaming data processing and can > run on a number of runtimes like Apache Flink, Apache Spark, and Google > Cloud Dataflow (a cloud service). Beam also brings DSL in different > languages, allowing users to easily implement their data integration > processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for > a variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for > powerful data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam model > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner > allows a Beam program to run on a managed service, Google Cloud Dataflow, > in Google Cloud Platform. The Dataflow Java SDK is already available on > GitHub, and independent from the Google Cloud Dataflow service. Another > Python SDK is currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of > this proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series of computations > including input, processing, and
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On Friday, January 29, 2016, Adunuthula, Seshu wrote: > +1 (non-binding) > > On 1/28/16, 12:05 PM, "Julian Hyde" > > wrote: > > >+1 (binding) > > > >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal > wrote: > >> > >> +1 (non-binding) > >> > >> Thanks, > >> Mayank > >> > >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh < > >> venkat...@innerzeal.com > wrote: > >> > >>> +1 (binding). > >>> > >>> Thanks! > >>> > >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning > > >>> wrote: > >>> > +1 > > > > On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament > > > wrote: > > > +1 > > > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré > >> > > wrote: > > > >> Hi, > >> > >> the Beam proposal (initially Dataflow) was proposed last week. > >> > >> The complete discussion thread is available here: > >> > >> > >> > > > > >>> > >>> > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/% > >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c > >>>om%3E > >> > >> As reminder the BeamProposal is here: > >> > >> https://wiki.apache.org/incubator/BeamProposal > >> > >> Regarding all the great feedbacks we received on the mailing list, > >>we > >> think it's time to call a vote to accept Beam into the Incubator. > >> > >> Please cast your vote to: > >> [] +1 - accept Apache Beam as a new incubating project > >> [] 0 - not sure > >> [] -1 - do not accept the Apache Beam project (because: ...) > >> > >> Thanks, > >> Regards > >> JB > >> > >> ## page was renamed from DataflowProposal > >> = Apache Beam = > >> > >> == Abstract == > >> > >> Apache Beam is an open source, unified model and set of > >> language-specific SDKs for defining and executing data processing > >> workflows, and also data ingestion and integration flows, supporting > >> Enterprise Integration Patterns (EIPs) and Domain Specific Languages > >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale > >>> batch > >> and streaming data processing and can run on a number of runtimes > >>> like > >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud > service). > >> Beam also brings DSL in different languages, allowing users to > >>easily > >> implement their data integration processes. > >> > >> == Proposal == > >> > >> Beam is a simple, flexible, and powerful system for distributed data > >> processing at any scale. Beam provides a unified programming model, > >>a > >> software development kit to define and construct data processing > >> pipelines, and runners to execute Beam pipelines in several runtime > >> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. > Beam > >> can be used for a variety of streaming or batch data processing > >>goals > >> including ETL, stream analysis, and aggregate computation. The > >> underlying programming model for Beam provides MapReduce-like > >> parallelism, combined with support for powerful data windowing, and > >> fine-grained correctness control. > >> > >> == Background == > >> > >> Beam started as a set of Google projects (Google Cloud Dataflow) > focused > >> on making data processing easier, faster, and less costly. The Beam > >> model is a successor to MapReduce, FlumeJava, and Millwheel inside > >> Google and is focused on providing a unified solution for batch and > >> stream processing. These projects on which Beam is based have been > >> published in several papers made available to the public: > >> > >> * MapReduce - http://research.google.com/archive/mapreduce.html > >> * Dataflow model - > >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > >> * FlumeJava - http://research.google.com/pubs/pub35650.html > >> * MillWheel - http://research.google.com/pubs/pub41378.html > >> > >> Beam was designed from the start to provide a portable programming > >> layer. When you define a data processing pipeline with the Beam > >>> model, > >> you are creating a job which is capable of being processed by any > number > >> of Beam processing engines. Several engines have been developed to > >>> run > >> Beam pipelines in other open source runtimes, including a Beam > >>runner > >> for Apache Flink and Apache Spark. There is also a ³direct runner², > >>> for > >> execution on the developer machine (mainly for dev/debug purposes). > >> Another runner allows a Beam program to run on a managed service, > Google > >> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is > >> already available on GitHub, and independent from the Google Cloud > >> Dataflow service. Another Python SDK is curre
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) Regards, Hao On Fri, Jan 29, 2016 at 11:32 AM, Johan Edstrom wrote: > +1 > > > On Jan 28, 2016, at 6:34 PM, Naresh Agarwal > wrote: > > > > +1 (non-binding) > > > > Thanks > > Naresh > > On 29 Jan 2016 06:18, "Hadrian Zbarcea" wrote: > > > >> +1 (binding) > >> > >> Man, congrats on a job fantastically well done. This is ASF incubator > >> participation at its best. > >> > >> Expectations are high now. I am looking forward to exemplary governance > >> and speedy graduation. > >> > >> Best of luck, > >> Hadrian > >> > >> On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote: > >> > >>> Hi, > >>> > >>> the Beam proposal (initially Dataflow) was proposed last week. > >>> > >>> The complete discussion thread is available here: > >>> > >>> > >>> > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > >>> > >>> > >>> As reminder the BeamProposal is here: > >>> > >>> https://wiki.apache.org/incubator/BeamProposal > >>> > >>> Regarding all the great feedbacks we received on the mailing list, we > >>> think it's time to call a vote to accept Beam into the Incubator. > >>> > >>> Please cast your vote to: > >>> [] +1 - accept Apache Beam as a new incubating project > >>> [] 0 - not sure > >>> [] -1 - do not accept the Apache Beam project (because: ...) > >>> > >>> Thanks, > >>> Regards > >>> JB > >>> > >>> ## page was renamed from DataflowProposal > >>> = Apache Beam = > >>> > >>> == Abstract == > >>> > >>> Apache Beam is an open source, unified model and set of > >>> language-specific SDKs for defining and executing data processing > >>> workflows, and also data ingestion and integration flows, supporting > >>> Enterprise Integration Patterns (EIPs) and Domain Specific Languages > >>> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch > >>> and streaming data processing and can run on a number of runtimes like > >>> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud > service). > >>> Beam also brings DSL in different languages, allowing users to easily > >>> implement their data integration processes. > >>> > >>> == Proposal == > >>> > >>> Beam is a simple, flexible, and powerful system for distributed data > >>> processing at any scale. Beam provides a unified programming model, a > >>> software development kit to define and construct data processing > >>> pipelines, and runners to execute Beam pipelines in several runtime > >>> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. > Beam > >>> can be used for a variety of streaming or batch data processing goals > >>> including ETL, stream analysis, and aggregate computation. The > >>> underlying programming model for Beam provides MapReduce-like > >>> parallelism, combined with support for powerful data windowing, and > >>> fine-grained correctness control. > >>> > >>> == Background == > >>> > >>> Beam started as a set of Google projects (Google Cloud Dataflow) > focused > >>> on making data processing easier, faster, and less costly. The Beam > >>> model is a successor to MapReduce, FlumeJava, and Millwheel inside > >>> Google and is focused on providing a unified solution for batch and > >>> stream processing. These projects on which Beam is based have been > >>> published in several papers made available to the public: > >>> > >>> * MapReduce - http://research.google.com/archive/mapreduce.html > >>> * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > >>> * FlumeJava - http://research.google.com/pubs/pub35650.html > >>> * MillWheel - http://research.google.com/pubs/pub41378.html > >>> > >>> Beam was designed from the start to provide a portable programming > >>> layer. When you define a data processing pipeline with the Beam model, > >>> you are creating a job which is capable of being processed by any > number > >>> of Beam processing engines. Several engines have been developed to run > >>> Beam pipelines in other open source runtimes, including a Beam runner > >>> for Apache Flink and Apache Spark. There is also a “direct runner”, for > >>> execution on the developer machine (mainly for dev/debug purposes). > >>> Another runner allows a Beam program to run on a managed service, > Google > >>> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is > >>> already available on GitHub, and independent from the Google Cloud > >>> Dataflow service. Another Python SDK is currently in active > development. > >>> > >>> In this proposal, the Beam SDKs, model, and a set of runners will be > >>> submitted as an OSS project under the ASF. The runners which are a part > >>> of this proposal include those for Spark (from Cloudera), Flink (from > >>> data Artisans), and local development (from Google); the Google Cloud > >>> Dataflow service runner is not included in this proposal. Further > >>> references to Beam will refer to the Dataflow model, SDKs, and runners > >>> whi
Re: [VOTE] Accept Beam into the Apache Incubator
+1(non-binding) - Tsuyoshi On Fri, Jan 29, 2016 at 12:32 PM, Johan Edstrom wrote: > +1 > >> On Jan 28, 2016, at 6:34 PM, Naresh Agarwal wrote: >> >> +1 (non-binding) >> >> Thanks >> Naresh >> On 29 Jan 2016 06:18, "Hadrian Zbarcea" wrote: >> >>> +1 (binding) >>> >>> Man, congrats on a job fantastically well done. This is ASF incubator >>> participation at its best. >>> >>> Expectations are high now. I am looking forward to exemplary governance >>> and speedy graduation. >>> >>> Best of luck, >>> Hadrian >>> >>> On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote: >>> Hi, the Beam proposal (initially Dataflow) was proposed last week. The complete discussion thread is available here: http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E As reminder the BeamProposal is here: https://wiki.apache.org/incubator/BeamProposal Regarding all the great feedbacks we received on the mailing list, we think it's time to call a vote to accept Beam into the Incubator. Please cast your vote to: [] +1 - accept Apache Beam as a new incubating project [] 0 - not sure [] -1 - do not accept the Apache Beam project (because: ...) Thanks, Regards JB ## page was renamed from DataflowProposal = Apache Beam = == Abstract == Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes. == Proposal == Beam is a simple, flexible, and powerful system for distributed data processing at any scale. Beam provides a unified programming model, a software development kit to define and construct data processing pipelines, and runners to execute Beam pipelines in several runtime engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety of streaming or batch data processing goals including ETL, stream analysis, and aggregate computation. The underlying programming model for Beam provides MapReduce-like parallelism, combined with support for powerful data windowing, and fine-grained correctness control. == Background == Beam started as a set of Google projects (Google Cloud Dataflow) focused on making data processing easier, faster, and less costly. The Beam model is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused on providing a unified solution for batch and stream processing. These projects on which Beam is based have been published in several papers made available to the public: * MapReduce - http://research.google.com/archive/mapreduce.html * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf * FlumeJava - http://research.google.com/pubs/pub35650.html * MillWheel - http://research.google.com/pubs/pub41378.html Beam was designed from the start to provide a portable programming layer. When you define a data processing pipeline with the Beam model, you are creating a job which is capable of being processed by any number of Beam processing engines. Several engines have been developed to run Beam pipelines in other open source runtimes, including a Beam runner for Apache Flink and Apache Spark. There is also a “direct runner”, for execution on the developer machine (mainly for dev/debug purposes). Another runner allows a Beam program to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is already available on GitHub, and independent from the Google Cloud Dataflow service. Another Python SDK is currently in active development. In this proposal, the Beam SDKs, model, and a set of runners will be submitted as an OSS project under the ASF. The runners which are a part of this proposal include those for Spark (from Cloudera), Flink (from data Artisans), and local development (from Google); the Google Cloud Dataflow service runner is not included in this proposal. Further references to Beam will refer to the Dataflow model, SDKs, and runners which are a part of this proposal (Apache Beam) only. The initial submission will contain the already-released Java SDK; Googl
Re: [VOTE] Accept Beam into the Apache Incubator
+1 > On Jan 28, 2016, at 6:34 PM, Naresh Agarwal wrote: > > +1 (non-binding) > > Thanks > Naresh > On 29 Jan 2016 06:18, "Hadrian Zbarcea" wrote: > >> +1 (binding) >> >> Man, congrats on a job fantastically well done. This is ASF incubator >> participation at its best. >> >> Expectations are high now. I am looking forward to exemplary governance >> and speedy graduation. >> >> Best of luck, >> Hadrian >> >> On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote: >> >>> Hi, >>> >>> the Beam proposal (initially Dataflow) was proposed last week. >>> >>> The complete discussion thread is available here: >>> >>> >>> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E >>> >>> >>> As reminder the BeamProposal is here: >>> >>> https://wiki.apache.org/incubator/BeamProposal >>> >>> Regarding all the great feedbacks we received on the mailing list, we >>> think it's time to call a vote to accept Beam into the Incubator. >>> >>> Please cast your vote to: >>> [] +1 - accept Apache Beam as a new incubating project >>> [] 0 - not sure >>> [] -1 - do not accept the Apache Beam project (because: ...) >>> >>> Thanks, >>> Regards >>> JB >>> >>> ## page was renamed from DataflowProposal >>> = Apache Beam = >>> >>> == Abstract == >>> >>> Apache Beam is an open source, unified model and set of >>> language-specific SDKs for defining and executing data processing >>> workflows, and also data ingestion and integration flows, supporting >>> Enterprise Integration Patterns (EIPs) and Domain Specific Languages >>> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch >>> and streaming data processing and can run on a number of runtimes like >>> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). >>> Beam also brings DSL in different languages, allowing users to easily >>> implement their data integration processes. >>> >>> == Proposal == >>> >>> Beam is a simple, flexible, and powerful system for distributed data >>> processing at any scale. Beam provides a unified programming model, a >>> software development kit to define and construct data processing >>> pipelines, and runners to execute Beam pipelines in several runtime >>> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam >>> can be used for a variety of streaming or batch data processing goals >>> including ETL, stream analysis, and aggregate computation. The >>> underlying programming model for Beam provides MapReduce-like >>> parallelism, combined with support for powerful data windowing, and >>> fine-grained correctness control. >>> >>> == Background == >>> >>> Beam started as a set of Google projects (Google Cloud Dataflow) focused >>> on making data processing easier, faster, and less costly. The Beam >>> model is a successor to MapReduce, FlumeJava, and Millwheel inside >>> Google and is focused on providing a unified solution for batch and >>> stream processing. These projects on which Beam is based have been >>> published in several papers made available to the public: >>> >>> * MapReduce - http://research.google.com/archive/mapreduce.html >>> * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >>> * FlumeJava - http://research.google.com/pubs/pub35650.html >>> * MillWheel - http://research.google.com/pubs/pub41378.html >>> >>> Beam was designed from the start to provide a portable programming >>> layer. When you define a data processing pipeline with the Beam model, >>> you are creating a job which is capable of being processed by any number >>> of Beam processing engines. Several engines have been developed to run >>> Beam pipelines in other open source runtimes, including a Beam runner >>> for Apache Flink and Apache Spark. There is also a “direct runner”, for >>> execution on the developer machine (mainly for dev/debug purposes). >>> Another runner allows a Beam program to run on a managed service, Google >>> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is >>> already available on GitHub, and independent from the Google Cloud >>> Dataflow service. Another Python SDK is currently in active development. >>> >>> In this proposal, the Beam SDKs, model, and a set of runners will be >>> submitted as an OSS project under the ASF. The runners which are a part >>> of this proposal include those for Spark (from Cloudera), Flink (from >>> data Artisans), and local development (from Google); the Google Cloud >>> Dataflow service runner is not included in this proposal. Further >>> references to Beam will refer to the Dataflow model, SDKs, and runners >>> which are a part of this proposal (Apache Beam) only. The initial >>> submission will contain the already-released Java SDK; Google intends to >>> submit the Python SDK later in the incubation process. The Google Cloud >>> Dataflow service will continue to be one of many runners for Beam, built >>>
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) Thanks Naresh On 29 Jan 2016 06:18, "Hadrian Zbarcea" wrote: > +1 (binding) > > Man, congrats on a job fantastically well done. This is ASF incubator > participation at its best. > > Expectations are high now. I am looking forward to exemplary governance > and speedy graduation. > > Best of luck, > Hadrian > > On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote: > >> Hi, >> >> the Beam proposal (initially Dataflow) was proposed last week. >> >> The complete discussion thread is available here: >> >> >> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E >> >> >> As reminder the BeamProposal is here: >> >> https://wiki.apache.org/incubator/BeamProposal >> >> Regarding all the great feedbacks we received on the mailing list, we >> think it's time to call a vote to accept Beam into the Incubator. >> >> Please cast your vote to: >> [] +1 - accept Apache Beam as a new incubating project >> [] 0 - not sure >> [] -1 - do not accept the Apache Beam project (because: ...) >> >> Thanks, >> Regards >> JB >> >> ## page was renamed from DataflowProposal >> = Apache Beam = >> >> == Abstract == >> >> Apache Beam is an open source, unified model and set of >> language-specific SDKs for defining and executing data processing >> workflows, and also data ingestion and integration flows, supporting >> Enterprise Integration Patterns (EIPs) and Domain Specific Languages >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch >> and streaming data processing and can run on a number of runtimes like >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). >> Beam also brings DSL in different languages, allowing users to easily >> implement their data integration processes. >> >> == Proposal == >> >> Beam is a simple, flexible, and powerful system for distributed data >> processing at any scale. Beam provides a unified programming model, a >> software development kit to define and construct data processing >> pipelines, and runners to execute Beam pipelines in several runtime >> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam >> can be used for a variety of streaming or batch data processing goals >> including ETL, stream analysis, and aggregate computation. The >> underlying programming model for Beam provides MapReduce-like >> parallelism, combined with support for powerful data windowing, and >> fine-grained correctness control. >> >> == Background == >> >> Beam started as a set of Google projects (Google Cloud Dataflow) focused >> on making data processing easier, faster, and less costly. The Beam >> model is a successor to MapReduce, FlumeJava, and Millwheel inside >> Google and is focused on providing a unified solution for batch and >> stream processing. These projects on which Beam is based have been >> published in several papers made available to the public: >> >> * MapReduce - http://research.google.com/archive/mapreduce.html >> * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >> * FlumeJava - http://research.google.com/pubs/pub35650.html >> * MillWheel - http://research.google.com/pubs/pub41378.html >> >> Beam was designed from the start to provide a portable programming >> layer. When you define a data processing pipeline with the Beam model, >> you are creating a job which is capable of being processed by any number >> of Beam processing engines. Several engines have been developed to run >> Beam pipelines in other open source runtimes, including a Beam runner >> for Apache Flink and Apache Spark. There is also a “direct runner”, for >> execution on the developer machine (mainly for dev/debug purposes). >> Another runner allows a Beam program to run on a managed service, Google >> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is >> already available on GitHub, and independent from the Google Cloud >> Dataflow service. Another Python SDK is currently in active development. >> >> In this proposal, the Beam SDKs, model, and a set of runners will be >> submitted as an OSS project under the ASF. The runners which are a part >> of this proposal include those for Spark (from Cloudera), Flink (from >> data Artisans), and local development (from Google); the Google Cloud >> Dataflow service runner is not included in this proposal. Further >> references to Beam will refer to the Dataflow model, SDKs, and runners >> which are a part of this proposal (Apache Beam) only. The initial >> submission will contain the already-released Java SDK; Google intends to >> submit the Python SDK later in the incubation process. The Google Cloud >> Dataflow service will continue to be one of many runners for Beam, built >> on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud >> Dataflow will develop against the Apache project additions, updates, and >> changes. Google Cloud Dataflow will become one use
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) Man, congrats on a job fantastically well done. This is ASF incubator participation at its best. Expectations are high now. I am looking forward to exemplary governance and speedy graduation. Best of luck, Hadrian On 01/28/2016 09:28 AM, Jean-Baptiste Onofré wrote: Hi, the Beam proposal (initially Dataflow) was proposed last week. The complete discussion thread is available here: http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E As reminder the BeamProposal is here: https://wiki.apache.org/incubator/BeamProposal Regarding all the great feedbacks we received on the mailing list, we think it's time to call a vote to accept Beam into the Incubator. Please cast your vote to: [] +1 - accept Apache Beam as a new incubating project [] 0 - not sure [] -1 - do not accept the Apache Beam project (because: ...) Thanks, Regards JB ## page was renamed from DataflowProposal = Apache Beam = == Abstract == Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes. == Proposal == Beam is a simple, flexible, and powerful system for distributed data processing at any scale. Beam provides a unified programming model, a software development kit to define and construct data processing pipelines, and runners to execute Beam pipelines in several runtime engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety of streaming or batch data processing goals including ETL, stream analysis, and aggregate computation. The underlying programming model for Beam provides MapReduce-like parallelism, combined with support for powerful data windowing, and fine-grained correctness control. == Background == Beam started as a set of Google projects (Google Cloud Dataflow) focused on making data processing easier, faster, and less costly. The Beam model is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused on providing a unified solution for batch and stream processing. These projects on which Beam is based have been published in several papers made available to the public: * MapReduce - http://research.google.com/archive/mapreduce.html * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf * FlumeJava - http://research.google.com/pubs/pub35650.html * MillWheel - http://research.google.com/pubs/pub41378.html Beam was designed from the start to provide a portable programming layer. When you define a data processing pipeline with the Beam model, you are creating a job which is capable of being processed by any number of Beam processing engines. Several engines have been developed to run Beam pipelines in other open source runtimes, including a Beam runner for Apache Flink and Apache Spark. There is also a “direct runner”, for execution on the developer machine (mainly for dev/debug purposes). Another runner allows a Beam program to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is already available on GitHub, and independent from the Google Cloud Dataflow service. Another Python SDK is currently in active development. In this proposal, the Beam SDKs, model, and a set of runners will be submitted as an OSS project under the ASF. The runners which are a part of this proposal include those for Spark (from Cloudera), Flink (from data Artisans), and local development (from Google); the Google Cloud Dataflow service runner is not included in this proposal. Further references to Beam will refer to the Dataflow model, SDKs, and runners which are a part of this proposal (Apache Beam) only. The initial submission will contain the already-released Java SDK; Google intends to submit the Python SDK later in the incubation process. The Google Cloud Dataflow service will continue to be one of many runners for Beam, built on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache project additions, updates, and changes. Google Cloud Dataflow will become one user of Apache Beam and will participate in the project openly and publicly. The Beam programming model has been designed with simplicity, scalability, and speed as key tenants. In the Beam model, you only need to think about four top-level concepts when constructing your data processing job: * Pipelines - The data processing job made of a series of comput
Re: [VOTE] Accept Beam into the Apache Incubator
On Jan 28, 2016 6:28 AM, "Jean-Baptiste Onofré" wrote: > > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > +1 Phil
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding). On Fri, Jan 29, 2016 at 7:51 AM, Gregory Chase wrote: > + 1 (non-binding), and cool name! > > On Thu, Jan 28, 2016 at 2:47 PM, Byung-Gon Chun wrote: > >> +1 (non-binding) >> >> >> >> On Fri, Jan 29, 2016 at 5:31 AM, Adunuthula, Seshu >> wrote: >> >> > +1 (non-binding) >> > >> > On 1/28/16, 12:05 PM, "Julian Hyde" wrote: >> > >> > >+1 (binding) >> > > >> > >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal >> wrote: >> > >> >> > >> +1 (non-binding) >> > >> >> > >> Thanks, >> > >> Mayank >> > >> >> > >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh < >> > >> venkat...@innerzeal.com> wrote: >> > >> >> > >>> +1 (binding). >> > >>> >> > >>> Thanks! >> > >>> >> > >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning >> > >>> wrote: >> > >>> >> > +1 >> > >> > >> > >> > On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament >> > >> > wrote: >> > >> > > +1 >> > > >> > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré >> > > >> > > wrote: >> > > >> > >> Hi, >> > >> >> > >> the Beam proposal (initially Dataflow) was proposed last week. >> > >> >> > >> The complete discussion thread is available here: >> > >> >> > >> >> > >> >> > > >> > >> > >>> >> > >>> >> > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/% >> > >> >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c >> > >>>om%3E >> > >> >> > >> As reminder the BeamProposal is here: >> > >> >> > >> https://wiki.apache.org/incubator/BeamProposal >> > >> >> > >> Regarding all the great feedbacks we received on the mailing list, >> > >>we >> > >> think it's time to call a vote to accept Beam into the Incubator. >> > >> >> > >> Please cast your vote to: >> > >> [] +1 - accept Apache Beam as a new incubating project >> > >> [] 0 - not sure >> > >> [] -1 - do not accept the Apache Beam project (because: ...) >> > >> >> > >> Thanks, >> > >> Regards >> > >> JB >> > >> >> > >> ## page was renamed from DataflowProposal >> > >> = Apache Beam = >> > >> >> > >> == Abstract == >> > >> >> > >> Apache Beam is an open source, unified model and set of >> > >> language-specific SDKs for defining and executing data processing >> > >> workflows, and also data ingestion and integration flows, >> supporting >> > >> Enterprise Integration Patterns (EIPs) and Domain Specific >> Languages >> > >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale >> > >>> batch >> > >> and streaming data processing and can run on a number of runtimes >> > >>> like >> > >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud >> > service). >> > >> Beam also brings DSL in different languages, allowing users to >> > >>easily >> > >> implement their data integration processes. >> > >> >> > >> == Proposal == >> > >> >> > >> Beam is a simple, flexible, and powerful system for distributed >> data >> > >> processing at any scale. Beam provides a unified programming >> model, >> > >>a >> > >> software development kit to define and construct data processing >> > >> pipelines, and runners to execute Beam pipelines in several >> runtime >> > >> engines, like Apache Spark, Apache Flink, or Google Cloud >> Dataflow. >> > Beam >> > >> can be used for a variety of streaming or batch data processing >> > >>goals >> > >> including ETL, stream analysis, and aggregate computation. The >> > >> underlying programming model for Beam provides MapReduce-like >> > >> parallelism, combined with support for powerful data windowing, >> and >> > >> fine-grained correctness control. >> > >> >> > >> == Background == >> > >> >> > >> Beam started as a set of Google projects (Google Cloud Dataflow) >> > focused >> > >> on making data processing easier, faster, and less costly. The >> Beam >> > >> model is a successor to MapReduce, FlumeJava, and Millwheel inside >> > >> Google and is focused on providing a unified solution for batch >> and >> > >> stream processing. These projects on which Beam is based have been >> > >> published in several papers made available to the public: >> > >> >> > >> * MapReduce - http://research.google.com/archive/mapreduce.html >> > >> * Dataflow model - >> > >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >> > >> * FlumeJava - http://research.google.com/pubs/pub35650.html >> > >> * MillWheel - http://research.google.com/pubs/pub41378.html >> > >> >> > >> Beam was designed from the start to provide a portable programming >> > >> layer. When you define a data processing pipeline with the Beam >> > >>> model, >> > >> you are creating a job which is capable of being processed by any >> > number >> > >> of Beam processing engines.
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) On Thu, Jan 28, 2016 at 8:28 AM, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > the mechanics of large-scale batch and streaming data processing and can > run on a number of runtimes like Apache Flink, Apache Spark, and Google > Cloud Dataflow (a cloud service). Beam also brings DSL in different > languages, allowing users to easily implement their data integration > processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for > a variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for > powerful data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam model > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner > allows a Beam program to run on a managed service, Google Cloud Dataflow, > in Google Cloud Platform. The Dataflow Java SDK is already available on > GitHub, and independent from the Google Cloud Dataflow service. Another > Python SDK is currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of > this proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series of computations > including input, processing, and
Re: [VOTE] Accept Beam into the Apache Incubator
+ 1 (non-binding), and cool name! On Thu, Jan 28, 2016 at 2:47 PM, Byung-Gon Chun wrote: > +1 (non-binding) > > > > On Fri, Jan 29, 2016 at 5:31 AM, Adunuthula, Seshu > wrote: > > > +1 (non-binding) > > > > On 1/28/16, 12:05 PM, "Julian Hyde" wrote: > > > > >+1 (binding) > > > > > >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal > wrote: > > >> > > >> +1 (non-binding) > > >> > > >> Thanks, > > >> Mayank > > >> > > >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh < > > >> venkat...@innerzeal.com> wrote: > > >> > > >>> +1 (binding). > > >>> > > >>> Thanks! > > >>> > > >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning > > >>> wrote: > > >>> > > +1 > > > > > > > > On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament > > > > wrote: > > > > > +1 > > > > > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré > > > > > > wrote: > > > > > >> Hi, > > >> > > >> the Beam proposal (initially Dataflow) was proposed last week. > > >> > > >> The complete discussion thread is available here: > > >> > > >> > > >> > > > > > > > >>> > > >>> > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/% > > > >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c > > >>>om%3E > > >> > > >> As reminder the BeamProposal is here: > > >> > > >> https://wiki.apache.org/incubator/BeamProposal > > >> > > >> Regarding all the great feedbacks we received on the mailing list, > > >>we > > >> think it's time to call a vote to accept Beam into the Incubator. > > >> > > >> Please cast your vote to: > > >> [] +1 - accept Apache Beam as a new incubating project > > >> [] 0 - not sure > > >> [] -1 - do not accept the Apache Beam project (because: ...) > > >> > > >> Thanks, > > >> Regards > > >> JB > > >> > > >> ## page was renamed from DataflowProposal > > >> = Apache Beam = > > >> > > >> == Abstract == > > >> > > >> Apache Beam is an open source, unified model and set of > > >> language-specific SDKs for defining and executing data processing > > >> workflows, and also data ingestion and integration flows, > supporting > > >> Enterprise Integration Patterns (EIPs) and Domain Specific > Languages > > >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale > > >>> batch > > >> and streaming data processing and can run on a number of runtimes > > >>> like > > >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud > > service). > > >> Beam also brings DSL in different languages, allowing users to > > >>easily > > >> implement their data integration processes. > > >> > > >> == Proposal == > > >> > > >> Beam is a simple, flexible, and powerful system for distributed > data > > >> processing at any scale. Beam provides a unified programming > model, > > >>a > > >> software development kit to define and construct data processing > > >> pipelines, and runners to execute Beam pipelines in several > runtime > > >> engines, like Apache Spark, Apache Flink, or Google Cloud > Dataflow. > > Beam > > >> can be used for a variety of streaming or batch data processing > > >>goals > > >> including ETL, stream analysis, and aggregate computation. The > > >> underlying programming model for Beam provides MapReduce-like > > >> parallelism, combined with support for powerful data windowing, > and > > >> fine-grained correctness control. > > >> > > >> == Background == > > >> > > >> Beam started as a set of Google projects (Google Cloud Dataflow) > > focused > > >> on making data processing easier, faster, and less costly. The > Beam > > >> model is a successor to MapReduce, FlumeJava, and Millwheel inside > > >> Google and is focused on providing a unified solution for batch > and > > >> stream processing. These projects on which Beam is based have been > > >> published in several papers made available to the public: > > >> > > >> * MapReduce - http://research.google.com/archive/mapreduce.html > > >> * Dataflow model - > > >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > >> * FlumeJava - http://research.google.com/pubs/pub35650.html > > >> * MillWheel - http://research.google.com/pubs/pub41378.html > > >> > > >> Beam was designed from the start to provide a portable programming > > >> layer. When you define a data processing pipeline with the Beam > > >>> model, > > >> you are creating a job which is capable of being processed by any > > number > > >> of Beam processing engines. Several engines have been developed to > > >>> run > > >> Beam pipelines in other open source runtimes, including a Beam > > >>runner > > >> for Apache Flink and Apache Spark. There is also a ³direct > runner²
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On Fri, Jan 29, 2016 at 5:31 AM, Adunuthula, Seshu wrote: > +1 (non-binding) > > On 1/28/16, 12:05 PM, "Julian Hyde" wrote: > > >+1 (binding) > > > >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal wrote: > >> > >> +1 (non-binding) > >> > >> Thanks, > >> Mayank > >> > >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh < > >> venkat...@innerzeal.com> wrote: > >> > >>> +1 (binding). > >>> > >>> Thanks! > >>> > >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning > >>> wrote: > >>> > +1 > > > > On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament > > wrote: > > > +1 > > > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré > > > > wrote: > > > >> Hi, > >> > >> the Beam proposal (initially Dataflow) was proposed last week. > >> > >> The complete discussion thread is available here: > >> > >> > >> > > > > >>> > >>> > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/% > >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c > >>>om%3E > >> > >> As reminder the BeamProposal is here: > >> > >> https://wiki.apache.org/incubator/BeamProposal > >> > >> Regarding all the great feedbacks we received on the mailing list, > >>we > >> think it's time to call a vote to accept Beam into the Incubator. > >> > >> Please cast your vote to: > >> [] +1 - accept Apache Beam as a new incubating project > >> [] 0 - not sure > >> [] -1 - do not accept the Apache Beam project (because: ...) > >> > >> Thanks, > >> Regards > >> JB > >> > >> ## page was renamed from DataflowProposal > >> = Apache Beam = > >> > >> == Abstract == > >> > >> Apache Beam is an open source, unified model and set of > >> language-specific SDKs for defining and executing data processing > >> workflows, and also data ingestion and integration flows, supporting > >> Enterprise Integration Patterns (EIPs) and Domain Specific Languages > >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale > >>> batch > >> and streaming data processing and can run on a number of runtimes > >>> like > >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud > service). > >> Beam also brings DSL in different languages, allowing users to > >>easily > >> implement their data integration processes. > >> > >> == Proposal == > >> > >> Beam is a simple, flexible, and powerful system for distributed data > >> processing at any scale. Beam provides a unified programming model, > >>a > >> software development kit to define and construct data processing > >> pipelines, and runners to execute Beam pipelines in several runtime > >> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. > Beam > >> can be used for a variety of streaming or batch data processing > >>goals > >> including ETL, stream analysis, and aggregate computation. The > >> underlying programming model for Beam provides MapReduce-like > >> parallelism, combined with support for powerful data windowing, and > >> fine-grained correctness control. > >> > >> == Background == > >> > >> Beam started as a set of Google projects (Google Cloud Dataflow) > focused > >> on making data processing easier, faster, and less costly. The Beam > >> model is a successor to MapReduce, FlumeJava, and Millwheel inside > >> Google and is focused on providing a unified solution for batch and > >> stream processing. These projects on which Beam is based have been > >> published in several papers made available to the public: > >> > >> * MapReduce - http://research.google.com/archive/mapreduce.html > >> * Dataflow model - > >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > >> * FlumeJava - http://research.google.com/pubs/pub35650.html > >> * MillWheel - http://research.google.com/pubs/pub41378.html > >> > >> Beam was designed from the start to provide a portable programming > >> layer. When you define a data processing pipeline with the Beam > >>> model, > >> you are creating a job which is capable of being processed by any > number > >> of Beam processing engines. Several engines have been developed to > >>> run > >> Beam pipelines in other open source runtimes, including a Beam > >>runner > >> for Apache Flink and Apache Spark. There is also a ³direct runner², > >>> for > >> execution on the developer machine (mainly for dev/debug purposes). > >> Another runner allows a Beam program to run on a managed service, > Google > >> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is > >> already available on GitHub, and independent from the Google Cloud > >> Dataflow service. Another Python SDK is currently
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On 1/28/16, 12:05 PM, "Julian Hyde" wrote: >+1 (binding) > >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal wrote: >> >> +1 (non-binding) >> >> Thanks, >> Mayank >> >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh < >> venkat...@innerzeal.com> wrote: >> >>> +1 (binding). >>> >>> Thanks! >>> >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning >>> wrote: >>> +1 On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament wrote: > +1 > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré > > wrote: > >> Hi, >> >> the Beam proposal (initially Dataflow) was proposed last week. >> >> The complete discussion thread is available here: >> >> >> > >>> >>>http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/% >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c >>>om%3E >> >> As reminder the BeamProposal is here: >> >> https://wiki.apache.org/incubator/BeamProposal >> >> Regarding all the great feedbacks we received on the mailing list, >>we >> think it's time to call a vote to accept Beam into the Incubator. >> >> Please cast your vote to: >> [] +1 - accept Apache Beam as a new incubating project >> [] 0 - not sure >> [] -1 - do not accept the Apache Beam project (because: ...) >> >> Thanks, >> Regards >> JB >> >> ## page was renamed from DataflowProposal >> = Apache Beam = >> >> == Abstract == >> >> Apache Beam is an open source, unified model and set of >> language-specific SDKs for defining and executing data processing >> workflows, and also data ingestion and integration flows, supporting >> Enterprise Integration Patterns (EIPs) and Domain Specific Languages >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale >>> batch >> and streaming data processing and can run on a number of runtimes >>> like >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). >> Beam also brings DSL in different languages, allowing users to >>easily >> implement their data integration processes. >> >> == Proposal == >> >> Beam is a simple, flexible, and powerful system for distributed data >> processing at any scale. Beam provides a unified programming model, >>a >> software development kit to define and construct data processing >> pipelines, and runners to execute Beam pipelines in several runtime >> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam >> can be used for a variety of streaming or batch data processing >>goals >> including ETL, stream analysis, and aggregate computation. The >> underlying programming model for Beam provides MapReduce-like >> parallelism, combined with support for powerful data windowing, and >> fine-grained correctness control. >> >> == Background == >> >> Beam started as a set of Google projects (Google Cloud Dataflow) focused >> on making data processing easier, faster, and less costly. The Beam >> model is a successor to MapReduce, FlumeJava, and Millwheel inside >> Google and is focused on providing a unified solution for batch and >> stream processing. These projects on which Beam is based have been >> published in several papers made available to the public: >> >> * MapReduce - http://research.google.com/archive/mapreduce.html >> * Dataflow model - >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >> * FlumeJava - http://research.google.com/pubs/pub35650.html >> * MillWheel - http://research.google.com/pubs/pub41378.html >> >> Beam was designed from the start to provide a portable programming >> layer. When you define a data processing pipeline with the Beam >>> model, >> you are creating a job which is capable of being processed by any number >> of Beam processing engines. Several engines have been developed to >>> run >> Beam pipelines in other open source runtimes, including a Beam >>runner >> for Apache Flink and Apache Spark. There is also a ³direct runner², >>> for >> execution on the developer machine (mainly for dev/debug purposes). >> Another runner allows a Beam program to run on a managed service, Google >> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is >> already available on GitHub, and independent from the Google Cloud >> Dataflow service. Another Python SDK is currently in active development. >> >> In this proposal, the Beam SDKs, model, and a set of runners will be >> submitted as an OSS project under the ASF. The runners which are a >>> part >> of this proposal include those for Spark (from Cloudera), Flink >>(from >> data Artisans), and local developme
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) > On Jan 28, 2016, at 10:42 AM, Mayank Bansal wrote: > > +1 (non-binding) > > Thanks, > Mayank > > On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh < > venkat...@innerzeal.com> wrote: > >> +1 (binding). >> >> Thanks! >> >> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning >> wrote: >> >>> +1 >>> >>> >>> >>> On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament >>> wrote: >>> +1 On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > > >>> >> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of > language-specific SDKs for defining and executing data processing > workflows, and also data ingestion and integration flows, supporting > Enterprise Integration Patterns (EIPs) and Domain Specific Languages > (DSLs). Dataflow pipelines simplify the mechanics of large-scale >> batch > and streaming data processing and can run on a number of runtimes >> like > Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud >>> service). > Beam also brings DSL in different languages, allowing users to easily > implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing > pipelines, and runners to execute Beam pipelines in several runtime > engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. >>> Beam > can be used for a variety of streaming or batch data processing goals > including ETL, stream analysis, and aggregate computation. The > underlying programming model for Beam provides MapReduce-like > parallelism, combined with support for powerful data windowing, and > fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) >>> focused > on making data processing easier, faster, and less costly. The Beam > model is a successor to MapReduce, FlumeJava, and Millwheel inside > Google and is focused on providing a unified solution for batch and > stream processing. These projects on which Beam is based have been > published in several papers made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - >> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming > layer. When you define a data processing pipeline with the Beam >> model, > you are creating a job which is capable of being processed by any >>> number > of Beam processing engines. Several engines have been developed to >> run > Beam pipelines in other open source runtimes, including a Beam runner > for Apache Flink and Apache Spark. There is also a “direct runner”, >> for > execution on the developer machine (mainly for dev/debug purposes). > Another runner allows a Beam program to run on a managed service, >>> Google > Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is > already available on GitHub, and independent from the Google Cloud > Dataflow service. Another Python SDK is currently in active >>> development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a >> part > of this proposal include those for Spark (from Cloudera), Flink (from > data Artisans), and local development (from Google); the Google Cloud > Dataflow service runner is not included in this proposal. Further > references to Beam will refer to the Dataflow model, SDKs, and >> runners > which are a part of this proposal (Apache Beam) only. The
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) Thanks, Mayank On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh < venkat...@innerzeal.com> wrote: > +1 (binding). > > Thanks! > > On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning > wrote: > > > +1 > > > > > > > > On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament > > wrote: > > > > > +1 > > > > > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré > > > wrote: > > > > > > > Hi, > > > > > > > > the Beam proposal (initially Dataflow) was proposed last week. > > > > > > > > The complete discussion thread is available here: > > > > > > > > > > > > > > > > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > > > > > > > As reminder the BeamProposal is here: > > > > > > > > https://wiki.apache.org/incubator/BeamProposal > > > > > > > > Regarding all the great feedbacks we received on the mailing list, we > > > > think it's time to call a vote to accept Beam into the Incubator. > > > > > > > > Please cast your vote to: > > > > [] +1 - accept Apache Beam as a new incubating project > > > > [] 0 - not sure > > > > [] -1 - do not accept the Apache Beam project (because: ...) > > > > > > > > Thanks, > > > > Regards > > > > JB > > > > > > > > ## page was renamed from DataflowProposal > > > > = Apache Beam = > > > > > > > > == Abstract == > > > > > > > > Apache Beam is an open source, unified model and set of > > > > language-specific SDKs for defining and executing data processing > > > > workflows, and also data ingestion and integration flows, supporting > > > > Enterprise Integration Patterns (EIPs) and Domain Specific Languages > > > > (DSLs). Dataflow pipelines simplify the mechanics of large-scale > batch > > > > and streaming data processing and can run on a number of runtimes > like > > > > Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud > > service). > > > > Beam also brings DSL in different languages, allowing users to easily > > > > implement their data integration processes. > > > > > > > > == Proposal == > > > > > > > > Beam is a simple, flexible, and powerful system for distributed data > > > > processing at any scale. Beam provides a unified programming model, a > > > > software development kit to define and construct data processing > > > > pipelines, and runners to execute Beam pipelines in several runtime > > > > engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. > > Beam > > > > can be used for a variety of streaming or batch data processing goals > > > > including ETL, stream analysis, and aggregate computation. The > > > > underlying programming model for Beam provides MapReduce-like > > > > parallelism, combined with support for powerful data windowing, and > > > > fine-grained correctness control. > > > > > > > > == Background == > > > > > > > > Beam started as a set of Google projects (Google Cloud Dataflow) > > focused > > > > on making data processing easier, faster, and less costly. The Beam > > > > model is a successor to MapReduce, FlumeJava, and Millwheel inside > > > > Google and is focused on providing a unified solution for batch and > > > > stream processing. These projects on which Beam is based have been > > > > published in several papers made available to the public: > > > > > > > > * MapReduce - http://research.google.com/archive/mapreduce.html > > > > * Dataflow model - > http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > > > * FlumeJava - http://research.google.com/pubs/pub35650.html > > > > * MillWheel - http://research.google.com/pubs/pub41378.html > > > > > > > > Beam was designed from the start to provide a portable programming > > > > layer. When you define a data processing pipeline with the Beam > model, > > > > you are creating a job which is capable of being processed by any > > number > > > > of Beam processing engines. Several engines have been developed to > run > > > > Beam pipelines in other open source runtimes, including a Beam runner > > > > for Apache Flink and Apache Spark. There is also a “direct runner”, > for > > > > execution on the developer machine (mainly for dev/debug purposes). > > > > Another runner allows a Beam program to run on a managed service, > > Google > > > > Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is > > > > already available on GitHub, and independent from the Google Cloud > > > > Dataflow service. Another Python SDK is currently in active > > development. > > > > > > > > In this proposal, the Beam SDKs, model, and a set of runners will be > > > > submitted as an OSS project under the ASF. The runners which are a > part > > > > of this proposal include those for Spark (from Cloudera), Flink (from > > > > data Artisans), and local development (from Google); the Google Cloud > > > > Dataflow service runner is not included in this proposal. Further > > > > references to Beam will refer to the Dataflow model, SDKs, and > runners > > > > which ar
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding). Thanks! On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning wrote: > +1 > > > > On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament > wrote: > > > +1 > > > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré > > wrote: > > > > > Hi, > > > > > > the Beam proposal (initially Dataflow) was proposed last week. > > > > > > The complete discussion thread is available here: > > > > > > > > > > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > > > > > As reminder the BeamProposal is here: > > > > > > https://wiki.apache.org/incubator/BeamProposal > > > > > > Regarding all the great feedbacks we received on the mailing list, we > > > think it's time to call a vote to accept Beam into the Incubator. > > > > > > Please cast your vote to: > > > [] +1 - accept Apache Beam as a new incubating project > > > [] 0 - not sure > > > [] -1 - do not accept the Apache Beam project (because: ...) > > > > > > Thanks, > > > Regards > > > JB > > > > > > ## page was renamed from DataflowProposal > > > = Apache Beam = > > > > > > == Abstract == > > > > > > Apache Beam is an open source, unified model and set of > > > language-specific SDKs for defining and executing data processing > > > workflows, and also data ingestion and integration flows, supporting > > > Enterprise Integration Patterns (EIPs) and Domain Specific Languages > > > (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch > > > and streaming data processing and can run on a number of runtimes like > > > Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud > service). > > > Beam also brings DSL in different languages, allowing users to easily > > > implement their data integration processes. > > > > > > == Proposal == > > > > > > Beam is a simple, flexible, and powerful system for distributed data > > > processing at any scale. Beam provides a unified programming model, a > > > software development kit to define and construct data processing > > > pipelines, and runners to execute Beam pipelines in several runtime > > > engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. > Beam > > > can be used for a variety of streaming or batch data processing goals > > > including ETL, stream analysis, and aggregate computation. The > > > underlying programming model for Beam provides MapReduce-like > > > parallelism, combined with support for powerful data windowing, and > > > fine-grained correctness control. > > > > > > == Background == > > > > > > Beam started as a set of Google projects (Google Cloud Dataflow) > focused > > > on making data processing easier, faster, and less costly. The Beam > > > model is a successor to MapReduce, FlumeJava, and Millwheel inside > > > Google and is focused on providing a unified solution for batch and > > > stream processing. These projects on which Beam is based have been > > > published in several papers made available to the public: > > > > > > * MapReduce - http://research.google.com/archive/mapreduce.html > > > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > > * FlumeJava - http://research.google.com/pubs/pub35650.html > > > * MillWheel - http://research.google.com/pubs/pub41378.html > > > > > > Beam was designed from the start to provide a portable programming > > > layer. When you define a data processing pipeline with the Beam model, > > > you are creating a job which is capable of being processed by any > number > > > of Beam processing engines. Several engines have been developed to run > > > Beam pipelines in other open source runtimes, including a Beam runner > > > for Apache Flink and Apache Spark. There is also a “direct runner”, for > > > execution on the developer machine (mainly for dev/debug purposes). > > > Another runner allows a Beam program to run on a managed service, > Google > > > Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is > > > already available on GitHub, and independent from the Google Cloud > > > Dataflow service. Another Python SDK is currently in active > development. > > > > > > In this proposal, the Beam SDKs, model, and a set of runners will be > > > submitted as an OSS project under the ASF. The runners which are a part > > > of this proposal include those for Spark (from Cloudera), Flink (from > > > data Artisans), and local development (from Google); the Google Cloud > > > Dataflow service runner is not included in this proposal. Further > > > references to Beam will refer to the Dataflow model, SDKs, and runners > > > which are a part of this proposal (Apache Beam) only. The initial > > > submission will contain the already-released Java SDK; Google intends > to > > > submit the Python SDK later in the incubation process. The Google Cloud > > > Dataflow service will continue to be one of many runners for Beam, > built > > > on Google Cloud Platform, to run Beam pipelines. Necessarily
Re: [VOTE] Accept Beam into the Apache Incubator
+1 On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament wrote: > +1 > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré > wrote: > > > Hi, > > > > the Beam proposal (initially Dataflow) was proposed last week. > > > > The complete discussion thread is available here: > > > > > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > > > As reminder the BeamProposal is here: > > > > https://wiki.apache.org/incubator/BeamProposal > > > > Regarding all the great feedbacks we received on the mailing list, we > > think it's time to call a vote to accept Beam into the Incubator. > > > > Please cast your vote to: > > [] +1 - accept Apache Beam as a new incubating project > > [] 0 - not sure > > [] -1 - do not accept the Apache Beam project (because: ...) > > > > Thanks, > > Regards > > JB > > > > ## page was renamed from DataflowProposal > > = Apache Beam = > > > > == Abstract == > > > > Apache Beam is an open source, unified model and set of > > language-specific SDKs for defining and executing data processing > > workflows, and also data ingestion and integration flows, supporting > > Enterprise Integration Patterns (EIPs) and Domain Specific Languages > > (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch > > and streaming data processing and can run on a number of runtimes like > > Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). > > Beam also brings DSL in different languages, allowing users to easily > > implement their data integration processes. > > > > == Proposal == > > > > Beam is a simple, flexible, and powerful system for distributed data > > processing at any scale. Beam provides a unified programming model, a > > software development kit to define and construct data processing > > pipelines, and runners to execute Beam pipelines in several runtime > > engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam > > can be used for a variety of streaming or batch data processing goals > > including ETL, stream analysis, and aggregate computation. The > > underlying programming model for Beam provides MapReduce-like > > parallelism, combined with support for powerful data windowing, and > > fine-grained correctness control. > > > > == Background == > > > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > > on making data processing easier, faster, and less costly. The Beam > > model is a successor to MapReduce, FlumeJava, and Millwheel inside > > Google and is focused on providing a unified solution for batch and > > stream processing. These projects on which Beam is based have been > > published in several papers made available to the public: > > > > * MapReduce - http://research.google.com/archive/mapreduce.html > > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > * FlumeJava - http://research.google.com/pubs/pub35650.html > > * MillWheel - http://research.google.com/pubs/pub41378.html > > > > Beam was designed from the start to provide a portable programming > > layer. When you define a data processing pipeline with the Beam model, > > you are creating a job which is capable of being processed by any number > > of Beam processing engines. Several engines have been developed to run > > Beam pipelines in other open source runtimes, including a Beam runner > > for Apache Flink and Apache Spark. There is also a “direct runner”, for > > execution on the developer machine (mainly for dev/debug purposes). > > Another runner allows a Beam program to run on a managed service, Google > > Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is > > already available on GitHub, and independent from the Google Cloud > > Dataflow service. Another Python SDK is currently in active development. > > > > In this proposal, the Beam SDKs, model, and a set of runners will be > > submitted as an OSS project under the ASF. The runners which are a part > > of this proposal include those for Spark (from Cloudera), Flink (from > > data Artisans), and local development (from Google); the Google Cloud > > Dataflow service runner is not included in this proposal. Further > > references to Beam will refer to the Dataflow model, SDKs, and runners > > which are a part of this proposal (Apache Beam) only. The initial > > submission will contain the already-released Java SDK; Google intends to > > submit the Python SDK later in the incubation process. The Google Cloud > > Dataflow service will continue to be one of many runners for Beam, built > > on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud > > Dataflow will develop against the Apache project additions, updates, and > > changes. Google Cloud Dataflow will become one user of Apache Beam and > > will participate in the project openly and publicly. > > > > The Beam programming model has been designed with simplicity, > > scalabi
Re: [VOTE] Accept Beam into the Apache Incubator
+1 On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of > language-specific SDKs for defining and executing data processing > workflows, and also data ingestion and integration flows, supporting > Enterprise Integration Patterns (EIPs) and Domain Specific Languages > (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch > and streaming data processing and can run on a number of runtimes like > Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). > Beam also brings DSL in different languages, allowing users to easily > implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing > pipelines, and runners to execute Beam pipelines in several runtime > engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam > can be used for a variety of streaming or batch data processing goals > including ETL, stream analysis, and aggregate computation. The > underlying programming model for Beam provides MapReduce-like > parallelism, combined with support for powerful data windowing, and > fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam > model is a successor to MapReduce, FlumeJava, and Millwheel inside > Google and is focused on providing a unified solution for batch and > stream processing. These projects on which Beam is based have been > published in several papers made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming > layer. When you define a data processing pipeline with the Beam model, > you are creating a job which is capable of being processed by any number > of Beam processing engines. Several engines have been developed to run > Beam pipelines in other open source runtimes, including a Beam runner > for Apache Flink and Apache Spark. There is also a “direct runner”, for > execution on the developer machine (mainly for dev/debug purposes). > Another runner allows a Beam program to run on a managed service, Google > Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is > already available on GitHub, and independent from the Google Cloud > Dataflow service. Another Python SDK is currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part > of this proposal include those for Spark (from Cloudera), Flink (from > data Artisans), and local development (from Google); the Google Cloud > Dataflow service runner is not included in this proposal. Further > references to Beam will refer to the Dataflow model, SDKs, and runners > which are a part of this proposal (Apache Beam) only. The initial > submission will contain the already-released Java SDK; Google intends to > submit the Python SDK later in the incubation process. The Google Cloud > Dataflow service will continue to be one of many runners for Beam, built > on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud > Dataflow will develop against the Apache project additions, updates, and > changes. Google Cloud Dataflow will become one user of Apache Beam and > will participate in the project openly and publicly. > > The Beam programming model has been designed with simplicity, > scalability, and speed as key tenants. In the Beam model, you only need > to think about four top-level concepts when constructing your data > processing job: > > * Pipelines - The data processing job made of a series of computations > including input, processing, and o
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) On Thu, Jan 28, 2016 at 6:28 AM, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > the mechanics of large-scale batch and streaming data processing and can > run on a number of runtimes like Apache Flink, Apache Spark, and Google > Cloud Dataflow (a cloud service). Beam also brings DSL in different > languages, allowing users to easily implement their data integration > processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for > a variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for > powerful data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam model > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner > allows a Beam program to run on a managed service, Google Cloud Dataflow, > in Google Cloud Platform. The Dataflow Java SDK is already available on > GitHub, and independent from the Google Cloud Dataflow service. Another > Python SDK is currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of > this proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series of computations > including input, processing, and
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) --Chris Nauroth On 1/28/16, 6:28 AM, "Jean-Baptiste Onofré" wrote: >Hi, > >the Beam proposal (initially Dataflow) was proposed last week. > >The complete discussion thread is available here: > >http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3C >CA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3 >E > >As reminder the BeamProposal is here: > >https://wiki.apache.org/incubator/BeamProposal > >Regarding all the great feedbacks we received on the mailing list, we >think it's time to call a vote to accept Beam into the Incubator. > >Please cast your vote to: >[] +1 - accept Apache Beam as a new incubating project >[] 0 - not sure >[] -1 - do not accept the Apache Beam project (because: ...) > >Thanks, >Regards >JB > >## page was renamed from DataflowProposal >= Apache Beam = > >== Abstract == > >Apache Beam is an open source, unified model and set of >language-specific SDKs for defining and executing data processing >workflows, and also data ingestion and integration flows, supporting >Enterprise Integration Patterns (EIPs) and Domain Specific Languages >(DSLs). Dataflow pipelines simplify the mechanics of large-scale batch >and streaming data processing and can run on a number of runtimes like >Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). >Beam also brings DSL in different languages, allowing users to easily >implement their data integration processes. > >== Proposal == > >Beam is a simple, flexible, and powerful system for distributed data >processing at any scale. Beam provides a unified programming model, a >software development kit to define and construct data processing >pipelines, and runners to execute Beam pipelines in several runtime >engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam >can be used for a variety of streaming or batch data processing goals >including ETL, stream analysis, and aggregate computation. The >underlying programming model for Beam provides MapReduce-like >parallelism, combined with support for powerful data windowing, and >fine-grained correctness control. > >== Background == > >Beam started as a set of Google projects (Google Cloud Dataflow) focused >on making data processing easier, faster, and less costly. The Beam >model is a successor to MapReduce, FlumeJava, and Millwheel inside >Google and is focused on providing a unified solution for batch and >stream processing. These projects on which Beam is based have been >published in several papers made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > >Beam was designed from the start to provide a portable programming >layer. When you define a data processing pipeline with the Beam model, >you are creating a job which is capable of being processed by any number >of Beam processing engines. Several engines have been developed to run >Beam pipelines in other open source runtimes, including a Beam runner >for Apache Flink and Apache Spark. There is also a ³direct runner², for >execution on the developer machine (mainly for dev/debug purposes). >Another runner allows a Beam program to run on a managed service, Google >Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is >already available on GitHub, and independent from the Google Cloud >Dataflow service. Another Python SDK is currently in active development. > >In this proposal, the Beam SDKs, model, and a set of runners will be >submitted as an OSS project under the ASF. The runners which are a part >of this proposal include those for Spark (from Cloudera), Flink (from >data Artisans), and local development (from Google); the Google Cloud >Dataflow service runner is not included in this proposal. Further >references to Beam will refer to the Dataflow model, SDKs, and runners >which are a part of this proposal (Apache Beam) only. The initial >submission will contain the already-released Java SDK; Google intends to >submit the Python SDK later in the incubation process. The Google Cloud >Dataflow service will continue to be one of many runners for Beam, built >on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud >Dataflow will develop against the Apache project additions, updates, and >changes. Google Cloud Dataflow will become one user of Apache Beam and >will participate in the project openly and publicly. > >The Beam programming model has been designed with simplicity, >scalability, and speed as key tenants. In the Beam model, you only need >to think about four top-level concepts when constructing your data >processing job: > > * Pipelines - The data processing job made of a series of computations >including input, processing, and output > * PCollections - Bounded (or unbounded) datasets
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding). -Andreas On Thu, Jan 28, 2016 at 9:28 AM, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > the mechanics of large-scale batch and streaming data processing and can > run on a number of runtimes like Apache Flink, Apache Spark, and Google > Cloud Dataflow (a cloud service). Beam also brings DSL in different > languages, allowing users to easily implement their data integration > processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for > a variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for > powerful data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam model > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner > allows a Beam program to run on a managed service, Google Cloud Dataflow, > in Google Cloud Platform. The Dataflow Java SDK is already available on > GitHub, and independent from the Google Cloud Dataflow service. Another > Python SDK is currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of > this proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series of computations > including input,
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) Best, Markus .:: YAGNI likes a DRY KISS ::. On Thu, Jan 28, 2016 at 8:51 AM -0800, "Ashish" wrote: +1 (non-binding) On Thu, Jan 28, 2016 at 6:28 AM, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we think > it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the > mechanics of large-scale batch and streaming data processing and can run on > a number of runtimes like Apache Flink, Apache Spark, and Google Cloud > Dataflow (a cloud service). Beam also brings DSL in different languages, > allowing users to easily implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a > variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for powerful > data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused on > making data processing easier, faster, and less costly. The Beam model is a > successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner allows > a Beam program to run on a managed service, Google Cloud Dataflow, in Google > Cloud Platform. The Dataflow Java SDK is already available on GitHub, and > independent from the Google Cloud Dataflow service. Another Python SDK is > currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of this > proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run Beam > pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing y
Re: [VOTE] Accept Beam into the Apache Incubator
+1 Thanks Prasanth > On Jan 28, 2016, at 10:45 AM, Supun Kamburugamuva wrote: > > +1 > > Supun.. > > On Thu, Jan 28, 2016 at 11:43 AM, Daniel Kulp wrote: > >> +1 >> >> Dan >> >> >> >>> On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré >> wrote: >>> >>> Hi, >>> >>> the Beam proposal (initially Dataflow) was proposed last week. >>> >>> The complete discussion thread is available here: >>> >>> >> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E >>> >>> As reminder the BeamProposal is here: >>> >>> https://wiki.apache.org/incubator/BeamProposal >>> >>> Regarding all the great feedbacks we received on the mailing list, we >> think it's time to call a vote to accept Beam into the Incubator. >>> >>> Please cast your vote to: >>> [] +1 - accept Apache Beam as a new incubating project >>> [] 0 - not sure >>> [] -1 - do not accept the Apache Beam project (because: ...) >>> >>> Thanks, >>> Regards >>> JB >>> >>> ## page was renamed from DataflowProposal >>> = Apache Beam = >>> >>> == Abstract == >>> >>> Apache Beam is an open source, unified model and set of >> language-specific SDKs for defining and executing data processing >> workflows, and also data ingestion and integration flows, supporting >> Enterprise Integration Patterns (EIPs) and Domain Specific Languages >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and >> streaming data processing and can run on a number of runtimes like Apache >> Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also >> brings DSL in different languages, allowing users to easily implement their >> data integration processes. >>> >>> == Proposal == >>> >>> Beam is a simple, flexible, and powerful system for distributed data >> processing at any scale. Beam provides a unified programming model, a >> software development kit to define and construct data processing pipelines, >> and runners to execute Beam pipelines in several runtime engines, like >> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for >> a variety of streaming or batch data processing goals including ETL, stream >> analysis, and aggregate computation. The underlying programming model for >> Beam provides MapReduce-like parallelism, combined with support for >> powerful data windowing, and fine-grained correctness control. >>> >>> == Background == >>> >>> Beam started as a set of Google projects (Google Cloud Dataflow) focused >> on making data processing easier, faster, and less costly. The Beam model >> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is >> focused on providing a unified solution for batch and stream processing. >> These projects on which Beam is based have been published in several papers >> made available to the public: >>> >>> * MapReduce - http://research.google.com/archive/mapreduce.html >>> * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >>> * FlumeJava - http://research.google.com/pubs/pub35650.html >>> * MillWheel - http://research.google.com/pubs/pub41378.html >>> >>> Beam was designed from the start to provide a portable programming >> layer. When you define a data processing pipeline with the Beam model, you >> are creating a job which is capable of being processed by any number of >> Beam processing engines. Several engines have been developed to run Beam >> pipelines in other open source runtimes, including a Beam runner for Apache >> Flink and Apache Spark. There is also a “direct runner”, for execution on >> the developer machine (mainly for dev/debug purposes). Another runner >> allows a Beam program to run on a managed service, Google Cloud Dataflow, >> in Google Cloud Platform. The Dataflow Java SDK is already available on >> GitHub, and independent from the Google Cloud Dataflow service. Another >> Python SDK is currently in active development. >>> >>> In this proposal, the Beam SDKs, model, and a set of runners will be >> submitted as an OSS project under the ASF. The runners which are a part of >> this proposal include those for Spark (from Cloudera), Flink (from data >> Artisans), and local development (from Google); the Google Cloud Dataflow >> service runner is not included in this proposal. Further references to Beam >> will refer to the Dataflow model, SDKs, and runners which are a part of >> this proposal (Apache Beam) only. The initial submission will contain the >> already-released Java SDK; Google intends to submit the Python SDK later in >> the incubation process. The Google Cloud Dataflow service will continue to >> be one of many runners for Beam, built on Google Cloud Platform, to run >> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache >> project additions, updates, and changes. Google Cloud Dataflow will become >> one user of Apache Beam and will participate in the project openly and >> publicly
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On Thu, Jan 28, 2016 at 6:28 AM, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we think > it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the > mechanics of large-scale batch and streaming data processing and can run on > a number of runtimes like Apache Flink, Apache Spark, and Google Cloud > Dataflow (a cloud service). Beam also brings DSL in different languages, > allowing users to easily implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a > variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for powerful > data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused on > making data processing easier, faster, and less costly. The Beam model is a > successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner allows > a Beam program to run on a managed service, Google Cloud Dataflow, in Google > Cloud Platform. The Dataflow Java SDK is already available on GitHub, and > independent from the Google Cloud Dataflow service. Another Python SDK is > currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of this > proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run Beam > pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series of computations > including input, processing, and
Re: [VOTE] Accept Beam into the Apache Incubator
+1 Supun.. On Thu, Jan 28, 2016 at 11:43 AM, Daniel Kulp wrote: > +1 > > Dan > > > > > On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré > wrote: > > > > Hi, > > > > the Beam proposal (initially Dataflow) was proposed last week. > > > > The complete discussion thread is available here: > > > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > > > As reminder the BeamProposal is here: > > > > https://wiki.apache.org/incubator/BeamProposal > > > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > > > Please cast your vote to: > > [] +1 - accept Apache Beam as a new incubating project > > [] 0 - not sure > > [] -1 - do not accept the Apache Beam project (because: ...) > > > > Thanks, > > Regards > > JB > > > > ## page was renamed from DataflowProposal > > = Apache Beam = > > > > == Abstract == > > > > Apache Beam is an open source, unified model and set of > language-specific SDKs for defining and executing data processing > workflows, and also data ingestion and integration flows, supporting > Enterprise Integration Patterns (EIPs) and Domain Specific Languages > (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and > streaming data processing and can run on a number of runtimes like Apache > Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also > brings DSL in different languages, allowing users to easily implement their > data integration processes. > > > > == Proposal == > > > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for > a variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for > powerful data windowing, and fine-grained correctness control. > > > > == Background == > > > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam model > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > > > * MapReduce - http://research.google.com/archive/mapreduce.html > > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > * FlumeJava - http://research.google.com/pubs/pub35650.html > > * MillWheel - http://research.google.com/pubs/pub41378.html > > > > Beam was designed from the start to provide a portable programming > layer. When you define a data processing pipeline with the Beam model, you > are creating a job which is capable of being processed by any number of > Beam processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner > allows a Beam program to run on a managed service, Google Cloud Dataflow, > in Google Cloud Platform. The Dataflow Java SDK is already available on > GitHub, and independent from the Google Cloud Dataflow service. Another > Python SDK is currently in active development. > > > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of > this proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > > > The Beam programming model has been designed with simplicity, > scalability, and speed as key tenants. In the Beam model, you only need to > think about four top
Re: [VOTE] Accept Beam into the Apache Incubator
+1 Dan > On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré wrote: > > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we think > it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the > mechanics of large-scale batch and streaming data processing and can run on a > number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow > (a cloud service). Beam also brings DSL in different languages, allowing > users to easily implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like Apache > Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety > of streaming or batch data processing goals including ETL, stream analysis, > and aggregate computation. The underlying programming model for Beam provides > MapReduce-like parallelism, combined with support for powerful data > windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused on > making data processing easier, faster, and less costly. The Beam model is a > successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused > on providing a unified solution for batch and stream processing. These > projects on which Beam is based have been published in several papers made > available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam pipelines > in other open source runtimes, including a Beam runner for Apache Flink and > Apache Spark. There is also a “direct runner”, for execution on the developer > machine (mainly for dev/debug purposes). Another runner allows a Beam program > to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. > The Dataflow Java SDK is already available on GitHub, and independent from > the Google Cloud Dataflow service. Another Python SDK is currently in active > development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of this > proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to be > one of many runners for Beam, built on Google Cloud Platform, to run Beam > pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a seri
Re: [VOTE] Accept Beam into the Apache Incubator
+ 1 (binding). Suresh > On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré wrote: > > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we think > it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the > mechanics of large-scale batch and streaming data processing and can run on a > number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow > (a cloud service). Beam also brings DSL in different languages, allowing > users to easily implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like Apache > Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety > of streaming or batch data processing goals including ETL, stream analysis, > and aggregate computation. The underlying programming model for Beam provides > MapReduce-like parallelism, combined with support for powerful data > windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused on > making data processing easier, faster, and less costly. The Beam model is a > successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused > on providing a unified solution for batch and stream processing. These > projects on which Beam is based have been published in several papers made > available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam pipelines > in other open source runtimes, including a Beam runner for Apache Flink and > Apache Spark. There is also a “direct runner”, for execution on the developer > machine (mainly for dev/debug purposes). Another runner allows a Beam program > to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. > The Dataflow Java SDK is already available on GitHub, and independent from > the Google Cloud Dataflow service. Another Python SDK is currently in active > development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of this > proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to be > one of many runners for Beam, built on Google Cloud Platform, to run Beam > pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job m
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On Thu, Jan 28, 2016 at 8:08 AM Jim Jagielski wrote: > +1 (binding) > > > On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré > wrote: > > > > Hi, > > > > the Beam proposal (initially Dataflow) was proposed last week. > > > > The complete discussion thread is available here: > > > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > > > As reminder the BeamProposal is here: > > > > https://wiki.apache.org/incubator/BeamProposal > > > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > > > Please cast your vote to: > > [] +1 - accept Apache Beam as a new incubating project > > [] 0 - not sure > > [] -1 - do not accept the Apache Beam project (because: ...) > > > > Thanks, > > Regards > > JB > > > > ## page was renamed from DataflowProposal > > = Apache Beam = > > > > == Abstract == > > > > Apache Beam is an open source, unified model and set of > language-specific SDKs for defining and executing data processing > workflows, and also data ingestion and integration flows, supporting > Enterprise Integration Patterns (EIPs) and Domain Specific Languages > (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and > streaming data processing and can run on a number of runtimes like Apache > Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also > brings DSL in different languages, allowing users to easily implement their > data integration processes. > > > > == Proposal == > > > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for > a variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for > powerful data windowing, and fine-grained correctness control. > > > > == Background == > > > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam model > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > > > * MapReduce - http://research.google.com/archive/mapreduce.html > > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > > * FlumeJava - http://research.google.com/pubs/pub35650.html > > * MillWheel - http://research.google.com/pubs/pub41378.html > > > > Beam was designed from the start to provide a portable programming > layer. When you define a data processing pipeline with the Beam model, you > are creating a job which is capable of being processed by any number of > Beam processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner > allows a Beam program to run on a managed service, Google Cloud Dataflow, > in Google Cloud Platform. The Dataflow Java SDK is already available on > GitHub, and independent from the Google Cloud Dataflow service. Another > Python SDK is currently in active development. > > > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of > this proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > > > The Beam programming model has been designed with simplicity, > scalability, and speed as key tenants. In the Beam model, you only need to > think about four
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) > On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré wrote: > > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we think > it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the > mechanics of large-scale batch and streaming data processing and can run on a > number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow > (a cloud service). Beam also brings DSL in different languages, allowing > users to easily implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like Apache > Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety > of streaming or batch data processing goals including ETL, stream analysis, > and aggregate computation. The underlying programming model for Beam provides > MapReduce-like parallelism, combined with support for powerful data > windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused on > making data processing easier, faster, and less costly. The Beam model is a > successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused > on providing a unified solution for batch and stream processing. These > projects on which Beam is based have been published in several papers made > available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam pipelines > in other open source runtimes, including a Beam runner for Apache Flink and > Apache Spark. There is also a “direct runner”, for execution on the developer > machine (mainly for dev/debug purposes). Another runner allows a Beam program > to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. > The Dataflow Java SDK is already available on GitHub, and independent from > the Google Cloud Dataflow service. Another Python SDK is currently in active > development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of this > proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to be > one of many runners for Beam, built on Google Cloud Platform, to run Beam > pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a s
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) Regards Krzysztof On 28.01.2016 15:28, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we think > it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the > mechanics of large-scale batch and streaming data processing and can run on a > number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow > (a cloud service). Beam also brings DSL in different languages, allowing > users to easily implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like Apache > Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety > of streaming or batch data processing goals including ETL, stream analysis, > and aggregate computation. The underlying programming model for Beam provides > MapReduce-like parallelism, combined with support for powerful data > windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused on > making data processing easier, faster, and less costly. The Beam model is a > successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused > on providing a unified solution for batch and stream processing. These > projects on which Beam is based have been published in several papers made > available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam pipelines > in other open source runtimes, including a Beam runner for Apache Flink and > Apache Spark. There is also a “direct runner”, for execution on the developer > machine (mainly for dev/debug purposes). Another runner allows a Beam program > to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. > The Dataflow Java SDK is already available on GitHub, and independent from > the Google Cloud Dataflow service. Another Python SDK is currently in active > development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of this > proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to be > one of many runners for Beam, built on Google Cloud Platform, to run Beam > pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) -Taylor > On Jan 28, 2016, at 9:28 AM, Jean-Baptiste Onofré wrote: > > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we think > it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the > mechanics of large-scale batch and streaming data processing and can run on a > number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow > (a cloud service). Beam also brings DSL in different languages, allowing > users to easily implement their data integration processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like Apache > Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety > of streaming or batch data processing goals including ETL, stream analysis, > and aggregate computation. The underlying programming model for Beam provides > MapReduce-like parallelism, combined with support for powerful data > windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused on > making data processing easier, faster, and less costly. The Beam model is a > successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused > on providing a unified solution for batch and stream processing. These > projects on which Beam is based have been published in several papers made > available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam pipelines > in other open source runtimes, including a Beam runner for Apache Flink and > Apache Spark. There is also a “direct runner”, for execution on the developer > machine (mainly for dev/debug purposes). Another runner allows a Beam program > to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. > The Dataflow Java SDK is already available on GitHub, and independent from > the Google Cloud Dataflow service. Another Python SDK is currently in active > development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of this > proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to be > one of many runners for Beam, built on Google Cloud Platform, to run Beam > pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job ma
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) On Thursday, January 28, 2016, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > the mechanics of large-scale batch and streaming data processing and can > run on a number of runtimes like Apache Flink, Apache Spark, and Google > Cloud Dataflow (a cloud service). Beam also brings DSL in different > languages, allowing users to easily implement their data integration > processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for > a variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for > powerful data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam model > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner > allows a Beam program to run on a managed service, Google Cloud Dataflow, > in Google Cloud Platform. The Dataflow Java SDK is already available on > GitHub, and independent from the Google Cloud Dataflow service. Another > Python SDK is currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of > this proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series of computations > including input, processing, and o
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) > On 28 Jan 2016, at 15:59, Alexander Bezzubov wrote: > > +1 (non-binding) > > On Thu, Jan 28, 2016 at 3:54 PM, Joe Witt wrote: > >> +1 (non-binding) >> >> On Thu, Jan 28, 2016 at 9:48 AM, Sergio Fernández >> wrote: >>> +1 (binding) >>> >>> On Thu, Jan 28, 2016 at 3:28 PM, Jean-Baptiste Onofré >>> wrote: >>> Hi, the Beam proposal (initially Dataflow) was proposed last week. The complete discussion thread is available here: >> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E As reminder the BeamProposal is here: https://wiki.apache.org/incubator/BeamProposal Regarding all the great feedbacks we received on the mailing list, we think it's time to call a vote to accept Beam into the Incubator. Please cast your vote to: [] +1 - accept Apache Beam as a new incubating project [] 0 - not sure [] -1 - do not accept the Apache Beam project (because: ...) Thanks, Regards JB ## page was renamed from DataflowProposal = Apache Beam = == Abstract == Apache Beam is an open source, unified model and set of >> language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration >> Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes. == Proposal == Beam is a simple, flexible, and powerful system for distributed data processing at any scale. Beam provides a unified programming model, a software development kit to define and construct data processing >> pipelines, and runners to execute Beam pipelines in several runtime engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used >> for a variety of streaming or batch data processing goals including ETL, >> stream analysis, and aggregate computation. The underlying programming model >> for Beam provides MapReduce-like parallelism, combined with support for powerful data windowing, and fine-grained correctness control. == Background == Beam started as a set of Google projects (Google Cloud Dataflow) focused on making data processing easier, faster, and less costly. The Beam >> model is a successor to MapReduce, FlumeJava, and Millwheel inside Google and >> is focused on providing a unified solution for batch and stream processing. These projects on which Beam is based have been published in several >> papers made available to the public: * MapReduce - http://research.google.com/archive/mapreduce.html * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf * FlumeJava - http://research.google.com/pubs/pub35650.html * MillWheel - http://research.google.com/pubs/pub41378.html Beam was designed from the start to provide a portable programming >> layer. When you define a data processing pipeline with the Beam model, you are creating a job which is capable of being processed by any number of Beam processing engines. Several engines have been developed to run Beam pipelines in other open source runtimes, including a Beam runner for >> Apache Flink and Apache Spark. There is also a “direct runner”, for execution >> on the developer machine (mainly for dev/debug purposes). Another runner allows a Beam program to run on a managed service, Google Cloud >> Dataflow, in Google Cloud Platform. The Dataflow Java SDK is already available on GitHub, and independent from the Google Cloud Dataflow service. Another Python SDK is currently in active development. In this proposal, the Beam SDKs, model, and a set of runners will be submitted as an OSS project under the ASF. The runners which are a part >> of this proposal include those for Spark (from Cloudera), Flink (from data Artisans), and local development (from Google); the Google Cloud >> Dataflow service runner is not included in this proposal. Further references to >> Beam will refer to the Dataflow model, SDKs, and runners which are a part of this proposal (Apache Beam) only. The initial submission will contain >> the already-released Java SDK; Google intends to submit the Python SDK >> later in the incubation process. The Google Cloud Dataflow service will continue >> to be one of many runners for Beam, built on Googl
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On Thu, Jan 28, 2016 at 3:54 PM, Joe Witt wrote: > +1 (non-binding) > > On Thu, Jan 28, 2016 at 9:48 AM, Sergio Fernández > wrote: > > +1 (binding) > > > > On Thu, Jan 28, 2016 at 3:28 PM, Jean-Baptiste Onofré > > wrote: > > > >> Hi, > >> > >> the Beam proposal (initially Dataflow) was proposed last week. > >> > >> The complete discussion thread is available here: > >> > >> > >> > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > >> > >> As reminder the BeamProposal is here: > >> > >> https://wiki.apache.org/incubator/BeamProposal > >> > >> Regarding all the great feedbacks we received on the mailing list, we > >> think it's time to call a vote to accept Beam into the Incubator. > >> > >> Please cast your vote to: > >> [] +1 - accept Apache Beam as a new incubating project > >> [] 0 - not sure > >> [] -1 - do not accept the Apache Beam project (because: ...) > >> > >> Thanks, > >> Regards > >> JB > >> > >> ## page was renamed from DataflowProposal > >> = Apache Beam = > >> > >> == Abstract == > >> > >> Apache Beam is an open source, unified model and set of > language-specific > >> SDKs for defining and executing data processing workflows, and also data > >> ingestion and integration flows, supporting Enterprise Integration > Patterns > >> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > >> the mechanics of large-scale batch and streaming data processing and can > >> run on a number of runtimes like Apache Flink, Apache Spark, and Google > >> Cloud Dataflow (a cloud service). Beam also brings DSL in different > >> languages, allowing users to easily implement their data integration > >> processes. > >> > >> == Proposal == > >> > >> Beam is a simple, flexible, and powerful system for distributed data > >> processing at any scale. Beam provides a unified programming model, a > >> software development kit to define and construct data processing > pipelines, > >> and runners to execute Beam pipelines in several runtime engines, like > >> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used > for > >> a variety of streaming or batch data processing goals including ETL, > stream > >> analysis, and aggregate computation. The underlying programming model > for > >> Beam provides MapReduce-like parallelism, combined with support for > >> powerful data windowing, and fine-grained correctness control. > >> > >> == Background == > >> > >> Beam started as a set of Google projects (Google Cloud Dataflow) focused > >> on making data processing easier, faster, and less costly. The Beam > model > >> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and > is > >> focused on providing a unified solution for batch and stream processing. > >> These projects on which Beam is based have been published in several > papers > >> made available to the public: > >> > >> * MapReduce - http://research.google.com/archive/mapreduce.html > >> * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > >> * FlumeJava - http://research.google.com/pubs/pub35650.html > >> * MillWheel - http://research.google.com/pubs/pub41378.html > >> > >> Beam was designed from the start to provide a portable programming > layer. > >> When you define a data processing pipeline with the Beam model, you are > >> creating a job which is capable of being processed by any number of Beam > >> processing engines. Several engines have been developed to run Beam > >> pipelines in other open source runtimes, including a Beam runner for > Apache > >> Flink and Apache Spark. There is also a “direct runner”, for execution > on > >> the developer machine (mainly for dev/debug purposes). Another runner > >> allows a Beam program to run on a managed service, Google Cloud > Dataflow, > >> in Google Cloud Platform. The Dataflow Java SDK is already available on > >> GitHub, and independent from the Google Cloud Dataflow service. Another > >> Python SDK is currently in active development. > >> > >> In this proposal, the Beam SDKs, model, and a set of runners will be > >> submitted as an OSS project under the ASF. The runners which are a part > of > >> this proposal include those for Spark (from Cloudera), Flink (from data > >> Artisans), and local development (from Google); the Google Cloud > Dataflow > >> service runner is not included in this proposal. Further references to > Beam > >> will refer to the Dataflow model, SDKs, and runners which are a part of > >> this proposal (Apache Beam) only. The initial submission will contain > the > >> already-released Java SDK; Google intends to submit the Python SDK > later in > >> the incubation process. The Google Cloud Dataflow service will continue > to > >> be one of many runners for Beam, built on Google Cloud Platform, to run > >> Beam pipelines. Necessarily, Cloud Dataflow will develop against the > Apache > >> project addit
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On Thu, Jan 28, 2016 at 9:48 AM, Sergio Fernández wrote: > +1 (binding) > > On Thu, Jan 28, 2016 at 3:28 PM, Jean-Baptiste Onofré > wrote: > >> Hi, >> >> the Beam proposal (initially Dataflow) was proposed last week. >> >> The complete discussion thread is available here: >> >> >> http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E >> >> As reminder the BeamProposal is here: >> >> https://wiki.apache.org/incubator/BeamProposal >> >> Regarding all the great feedbacks we received on the mailing list, we >> think it's time to call a vote to accept Beam into the Incubator. >> >> Please cast your vote to: >> [] +1 - accept Apache Beam as a new incubating project >> [] 0 - not sure >> [] -1 - do not accept the Apache Beam project (because: ...) >> >> Thanks, >> Regards >> JB >> >> ## page was renamed from DataflowProposal >> = Apache Beam = >> >> == Abstract == >> >> Apache Beam is an open source, unified model and set of language-specific >> SDKs for defining and executing data processing workflows, and also data >> ingestion and integration flows, supporting Enterprise Integration Patterns >> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify >> the mechanics of large-scale batch and streaming data processing and can >> run on a number of runtimes like Apache Flink, Apache Spark, and Google >> Cloud Dataflow (a cloud service). Beam also brings DSL in different >> languages, allowing users to easily implement their data integration >> processes. >> >> == Proposal == >> >> Beam is a simple, flexible, and powerful system for distributed data >> processing at any scale. Beam provides a unified programming model, a >> software development kit to define and construct data processing pipelines, >> and runners to execute Beam pipelines in several runtime engines, like >> Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for >> a variety of streaming or batch data processing goals including ETL, stream >> analysis, and aggregate computation. The underlying programming model for >> Beam provides MapReduce-like parallelism, combined with support for >> powerful data windowing, and fine-grained correctness control. >> >> == Background == >> >> Beam started as a set of Google projects (Google Cloud Dataflow) focused >> on making data processing easier, faster, and less costly. The Beam model >> is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is >> focused on providing a unified solution for batch and stream processing. >> These projects on which Beam is based have been published in several papers >> made available to the public: >> >> * MapReduce - http://research.google.com/archive/mapreduce.html >> * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >> * FlumeJava - http://research.google.com/pubs/pub35650.html >> * MillWheel - http://research.google.com/pubs/pub41378.html >> >> Beam was designed from the start to provide a portable programming layer. >> When you define a data processing pipeline with the Beam model, you are >> creating a job which is capable of being processed by any number of Beam >> processing engines. Several engines have been developed to run Beam >> pipelines in other open source runtimes, including a Beam runner for Apache >> Flink and Apache Spark. There is also a “direct runner”, for execution on >> the developer machine (mainly for dev/debug purposes). Another runner >> allows a Beam program to run on a managed service, Google Cloud Dataflow, >> in Google Cloud Platform. The Dataflow Java SDK is already available on >> GitHub, and independent from the Google Cloud Dataflow service. Another >> Python SDK is currently in active development. >> >> In this proposal, the Beam SDKs, model, and a set of runners will be >> submitted as an OSS project under the ASF. The runners which are a part of >> this proposal include those for Spark (from Cloudera), Flink (from data >> Artisans), and local development (from Google); the Google Cloud Dataflow >> service runner is not included in this proposal. Further references to Beam >> will refer to the Dataflow model, SDKs, and runners which are a part of >> this proposal (Apache Beam) only. The initial submission will contain the >> already-released Java SDK; Google intends to submit the Python SDK later in >> the incubation process. The Google Cloud Dataflow service will continue to >> be one of many runners for Beam, built on Google Cloud Platform, to run >> Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache >> project additions, updates, and changes. Google Cloud Dataflow will become >> one user of Apache Beam and will participate in the project openly and >> publicly. >> >> The Beam programming model has been designed with simplicity, scalability, >> and speed as key tenants. In the Beam model, you only need to think about >
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (binding) On Thu, Jan 28, 2016 at 3:28 PM, Jean-Baptiste Onofré wrote: > Hi, > > the Beam proposal (initially Dataflow) was proposed last week. > > The complete discussion thread is available here: > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E > > As reminder the BeamProposal is here: > > https://wiki.apache.org/incubator/BeamProposal > > Regarding all the great feedbacks we received on the mailing list, we > think it's time to call a vote to accept Beam into the Incubator. > > Please cast your vote to: > [] +1 - accept Apache Beam as a new incubating project > [] 0 - not sure > [] -1 - do not accept the Apache Beam project (because: ...) > > Thanks, > Regards > JB > > ## page was renamed from DataflowProposal > = Apache Beam = > > == Abstract == > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and executing data processing workflows, and also data > ingestion and integration flows, supporting Enterprise Integration Patterns > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify > the mechanics of large-scale batch and streaming data processing and can > run on a number of runtimes like Apache Flink, Apache Spark, and Google > Cloud Dataflow (a cloud service). Beam also brings DSL in different > languages, allowing users to easily implement their data integration > processes. > > == Proposal == > > Beam is a simple, flexible, and powerful system for distributed data > processing at any scale. Beam provides a unified programming model, a > software development kit to define and construct data processing pipelines, > and runners to execute Beam pipelines in several runtime engines, like > Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for > a variety of streaming or batch data processing goals including ETL, stream > analysis, and aggregate computation. The underlying programming model for > Beam provides MapReduce-like parallelism, combined with support for > powerful data windowing, and fine-grained correctness control. > > == Background == > > Beam started as a set of Google projects (Google Cloud Dataflow) focused > on making data processing easier, faster, and less costly. The Beam model > is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is > focused on providing a unified solution for batch and stream processing. > These projects on which Beam is based have been published in several papers > made available to the public: > > * MapReduce - http://research.google.com/archive/mapreduce.html > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf > * FlumeJava - http://research.google.com/pubs/pub35650.html > * MillWheel - http://research.google.com/pubs/pub41378.html > > Beam was designed from the start to provide a portable programming layer. > When you define a data processing pipeline with the Beam model, you are > creating a job which is capable of being processed by any number of Beam > processing engines. Several engines have been developed to run Beam > pipelines in other open source runtimes, including a Beam runner for Apache > Flink and Apache Spark. There is also a “direct runner”, for execution on > the developer machine (mainly for dev/debug purposes). Another runner > allows a Beam program to run on a managed service, Google Cloud Dataflow, > in Google Cloud Platform. The Dataflow Java SDK is already available on > GitHub, and independent from the Google Cloud Dataflow service. Another > Python SDK is currently in active development. > > In this proposal, the Beam SDKs, model, and a set of runners will be > submitted as an OSS project under the ASF. The runners which are a part of > this proposal include those for Spark (from Cloudera), Flink (from data > Artisans), and local development (from Google); the Google Cloud Dataflow > service runner is not included in this proposal. Further references to Beam > will refer to the Dataflow model, SDKs, and runners which are a part of > this proposal (Apache Beam) only. The initial submission will contain the > already-released Java SDK; Google intends to submit the Python SDK later in > the incubation process. The Google Cloud Dataflow service will continue to > be one of many runners for Beam, built on Google Cloud Platform, to run > Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache > project additions, updates, and changes. Google Cloud Dataflow will become > one user of Apache Beam and will participate in the project openly and > publicly. > > The Beam programming model has been designed with simplicity, scalability, > and speed as key tenants. In the Beam model, you only need to think about > four top-level concepts when constructing your data processing job: > > * Pipelines - The data processing job made of a series of computations > including input, processing, and
Re: [VOTE] Accept Beam into the Apache Incubator
> [X] +1 - accept Apache Beam as a new incubating project -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Beam into the Apache Incubator
Here's my +1 of course. Regards JB On 01/28/2016 03:28 PM, Jean-Baptiste Onofré wrote: Hi, the Beam proposal (initially Dataflow) was proposed last week. The complete discussion thread is available here: http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E As reminder the BeamProposal is here: https://wiki.apache.org/incubator/BeamProposal Regarding all the great feedbacks we received on the mailing list, we think it's time to call a vote to accept Beam into the Incubator. Please cast your vote to: [] +1 - accept Apache Beam as a new incubating project [] 0 - not sure [] -1 - do not accept the Apache Beam project (because: ...) Thanks, Regards JB ## page was renamed from DataflowProposal = Apache Beam = == Abstract == Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes. == Proposal == Beam is a simple, flexible, and powerful system for distributed data processing at any scale. Beam provides a unified programming model, a software development kit to define and construct data processing pipelines, and runners to execute Beam pipelines in several runtime engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety of streaming or batch data processing goals including ETL, stream analysis, and aggregate computation. The underlying programming model for Beam provides MapReduce-like parallelism, combined with support for powerful data windowing, and fine-grained correctness control. == Background == Beam started as a set of Google projects (Google Cloud Dataflow) focused on making data processing easier, faster, and less costly. The Beam model is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused on providing a unified solution for batch and stream processing. These projects on which Beam is based have been published in several papers made available to the public: * MapReduce - http://research.google.com/archive/mapreduce.html * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf * FlumeJava - http://research.google.com/pubs/pub35650.html * MillWheel - http://research.google.com/pubs/pub41378.html Beam was designed from the start to provide a portable programming layer. When you define a data processing pipeline with the Beam model, you are creating a job which is capable of being processed by any number of Beam processing engines. Several engines have been developed to run Beam pipelines in other open source runtimes, including a Beam runner for Apache Flink and Apache Spark. There is also a “direct runner”, for execution on the developer machine (mainly for dev/debug purposes). Another runner allows a Beam program to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is already available on GitHub, and independent from the Google Cloud Dataflow service. Another Python SDK is currently in active development. In this proposal, the Beam SDKs, model, and a set of runners will be submitted as an OSS project under the ASF. The runners which are a part of this proposal include those for Spark (from Cloudera), Flink (from data Artisans), and local development (from Google); the Google Cloud Dataflow service runner is not included in this proposal. Further references to Beam will refer to the Dataflow model, SDKs, and runners which are a part of this proposal (Apache Beam) only. The initial submission will contain the already-released Java SDK; Google intends to submit the Python SDK later in the incubation process. The Google Cloud Dataflow service will continue to be one of many runners for Beam, built on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache project additions, updates, and changes. Google Cloud Dataflow will become one user of Apache Beam and will participate in the project openly and publicly. The Beam programming model has been designed with simplicity, scalability, and speed as key tenants. In the Beam model, you only need to think about four top-level concepts when constructing your data processing job: * Pipelines - The data processing job made of a series of computations including input, processing, and output * PCollections - Bounded (or unbounded) datasets which represent the input, intermediate and output data in pipelines * PTransforms - A data proces
[VOTE] Accept Beam into the Apache Incubator
Hi, the Beam proposal (initially Dataflow) was proposed last week. The complete discussion thread is available here: http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.com%3E As reminder the BeamProposal is here: https://wiki.apache.org/incubator/BeamProposal Regarding all the great feedbacks we received on the mailing list, we think it's time to call a vote to accept Beam into the Incubator. Please cast your vote to: [] +1 - accept Apache Beam as a new incubating project [] 0 - not sure [] -1 - do not accept the Apache Beam project (because: ...) Thanks, Regards JB ## page was renamed from DataflowProposal = Apache Beam = == Abstract == Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes. == Proposal == Beam is a simple, flexible, and powerful system for distributed data processing at any scale. Beam provides a unified programming model, a software development kit to define and construct data processing pipelines, and runners to execute Beam pipelines in several runtime engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam can be used for a variety of streaming or batch data processing goals including ETL, stream analysis, and aggregate computation. The underlying programming model for Beam provides MapReduce-like parallelism, combined with support for powerful data windowing, and fine-grained correctness control. == Background == Beam started as a set of Google projects (Google Cloud Dataflow) focused on making data processing easier, faster, and less costly. The Beam model is a successor to MapReduce, FlumeJava, and Millwheel inside Google and is focused on providing a unified solution for batch and stream processing. These projects on which Beam is based have been published in several papers made available to the public: * MapReduce - http://research.google.com/archive/mapreduce.html * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf * FlumeJava - http://research.google.com/pubs/pub35650.html * MillWheel - http://research.google.com/pubs/pub41378.html Beam was designed from the start to provide a portable programming layer. When you define a data processing pipeline with the Beam model, you are creating a job which is capable of being processed by any number of Beam processing engines. Several engines have been developed to run Beam pipelines in other open source runtimes, including a Beam runner for Apache Flink and Apache Spark. There is also a “direct runner”, for execution on the developer machine (mainly for dev/debug purposes). Another runner allows a Beam program to run on a managed service, Google Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is already available on GitHub, and independent from the Google Cloud Dataflow service. Another Python SDK is currently in active development. In this proposal, the Beam SDKs, model, and a set of runners will be submitted as an OSS project under the ASF. The runners which are a part of this proposal include those for Spark (from Cloudera), Flink (from data Artisans), and local development (from Google); the Google Cloud Dataflow service runner is not included in this proposal. Further references to Beam will refer to the Dataflow model, SDKs, and runners which are a part of this proposal (Apache Beam) only. The initial submission will contain the already-released Java SDK; Google intends to submit the Python SDK later in the incubation process. The Google Cloud Dataflow service will continue to be one of many runners for Beam, built on Google Cloud Platform, to run Beam pipelines. Necessarily, Cloud Dataflow will develop against the Apache project additions, updates, and changes. Google Cloud Dataflow will become one user of Apache Beam and will participate in the project openly and publicly. The Beam programming model has been designed with simplicity, scalability, and speed as key tenants. In the Beam model, you only need to think about four top-level concepts when constructing your data processing job: * Pipelines - The data processing job made of a series of computations including input, processing, and output * PCollections - Bounded (or unbounded) datasets which represent the input, intermediate and output data in pipelines * PTransforms - A data processing step in a pipeline in which one or