Talking About Beam
I wrote a piece published on O'Reilly about Beam https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code?utm_medium=social&utm_source=twitter.com&utm_campaign=lgen&utm_content=data+article+ki&cmp=tw-data-na-article-lgen_tw_article. It gives some of the thoughts and ideas that will help Beam adoption. I suggest reading it to get some ideas for how to talk about Beam at talks and conferences. Before writing the piece, I tested how it resonates with people. These really help people understand why Beam is used and how it solves the future proofing and scale proofing problems small companies face. Thanks, Jesse
Re: newbie question about beam
Hi Sergio, It was great talking with you in Vancouver. As of today, the Python SDK is here, [1], [2]. Wasn't that fast enough ;) Davor [1] https://github.com/apache/incubator-beam/pull/461 [2] https://github.com/apache/incubator-beam/tree/python-sdk/sdks/python On Tue, Jun 14, 2016 at 3:45 AM, Jean-Baptiste Onofré wrote: > Hi Sergio, > > Welcome aboard, and good to discuss with you during ApacheCon. > > Distribution of the resources is a point related to runner, and more > specifically to the execution environment of the runner. Each > runner/backend will implement their own logic. > > I don't know Keras enough to provide a strong advice. > > Regarding the Python SDK, we discussed about that last week: it's on the > way. We should have the Python SDK very soon (we were busy with the first > release). > > Regards > JB > > > On 06/14/2016 12:38 PM, Sergio Fernández wrote: > >> Hi guys, >> >> I'm newbie in the Beam community, but as someone who has used DataFlow in >> the past I've been following the podling since you came to ASK. I'm very >> happy to see that 0.1.0-incubating is finally going out, congratulations >> for such great milestone. >> >> I discussed with some of you guys in the last ApacheCon, and for me was >> good to know the Python SDK was just a matter of time and should come to >> Beam at some point. So coming back to the original plans < >> >> http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html >> >, >> do you manage any timeline to bring the Python SDK to Beam? >> >> So I'd like to bring a question how Beam plans to deal with the >> distribution of resources across all nodes, something I know it not really >> clean with some runners (e.g., Spark). More concretely, we're using Keras >> < >> http://keras.io/>, a deep learning Python library that is capable of >> running on top of either TensorFlow or Theano. Historically I know >> DataFlow >> and TensorFlow are not very compatible. But I wonder if the project has >> already discussed how to support running Keras (TensorFlow) tasks on Beam. >> For us is more for querying than for training, so I'd like to know if the >> Beam Model could natively support the distribution of the models >> (sometimes >> several GB). >> >> Thanks in advance. >> >> Cheers, >> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >
Re: [RESULT] [VOTE] Release version 0.1.0-incubating
The Apache Incubator has unanimously approved this release, with 6 approving and binding votes. We are now proceeding with the final steps of the release. On Sun, Jun 12, 2016 at 2:33 PM, Ismaël Mejía wrote: > Congratulations Davor, you, JB and all the team have made a great job. I am > really happy to see this release going out ! > > And remember they used to say that the first apache release is the hardest > one, so from now on it should be easier :) > > > On Sun, Jun 12, 2016 at 8:23 AM, Jesse Anderson > wrote: > > > Congrats on the first release! > > > > On Sun, Jun 12, 2016, 7:50 AM Davor Bonaci > > wrote: > > > > > I'm happy to announce that we have unanimously approved this release. > > > > > > There are 10 approving votes, 9 of which are binding: > > > * Davor Bonaci > > > * Robert Bradshaw > > > * Ben Chambers > > > * Dan Halperin > > > * Kenneth Knowles > > > * Aljoscha Krettek > > > * James Malone > > > * Jean-Baptiste Onofré > > > * Amit Sela > > > * Scott Wegner > > > > > > There are no disapproving votes. > > > > > > At this point, this proposal will be presented to the Apache Incubator > > for > > > their review. > > > > > > Thanks everyone! Personally, I'm super excited to see our first release > > > getting so close! > > > > > > Davor > > > > > > -- Forwarded message -- > > > From: Davor Bonaci > > > Date: Wed, Jun 8, 2016 at 4:20 PM > > > Subject: [VOTE] Release version 0.1.0-incubating > > > To: dev@beam.incubator.apache.org > > > > > > > > > Hi everyone, > > > Here's the first vote for the first release of Apache Beam -- version > > > 0.1.0-incubating! > > > > > > As a reminder, we aren't looking for any specific new functionality, > but > > > would like to release the existing code, get something to our users' > > hands, > > > and test the processes. Previous discussions and iterations on this > > release > > > have been archived on the dev@ mailing list. > > > > > > The complete staging area is available for your review, which includes: > > > * the official Apache source release to be deployed to dist.apache.org > > > [1], > > > and > > > * all artifacts to be deployed to the Maven Central Repository [2]. > > > > > > This corresponds to the tag "v0.1.0-incubating-RC3" in source control, > > [3]. > > > > > > Please vote as follows: > > > [ ] +1, Approve the release > > > [ ] -1, Do not approve the release (please provide specific comments) > > > > > > For those of us enjoying our first voting experience -- the release > > > checklist is here [4]. This is a "package release"-type of the Apache > > > voting process [5]. As customary, the vote will be open for 72 hours. > It > > is > > > adopted by majority approval with at least 3 PPMC affirmative votes. If > > > approved, the proposal will be presented to the Apache Incubator for > > their > > > review. > > > > > > Thanks, > > > Davor > > > > > > [1] > > > > > > > > > https://repository.apache.org/content/repositories/orgapachebeam-1002/org/apache/beam/beam-parent/0.1.0-incubating/beam-parent-0.1.0-incubating-source-release.zip > > > [2] > > https://repository.apache.org/content/repositories/orgapachebeam-1002/ > > > [3] > https://github.com/apache/incubator-beam/tree/v0.1.0-incubating-RC3 > > > [4] > http://incubator.apache.org/guides/releasemanagement.html#check-list > > > [5] http://www.apache.org/foundation/voting.html > > > > > >
Re: Testing and the Capability Matrix
@Thomas Completely agree, this is also how it is currently handled in the Flink runner. I was talking about the presentation of the compatibility matrix on the web site, whether we should have separate columns for Flink Stream/Batch and Spark Stream/Batch. (And possibly other runners in the future) On Tue, 14 Jun 2016 at 18:57 Thomas Groh wrote: > It is also worth noting that this document is a snapshot rather than the > long-term plan. As the SDK evolves, the annotations will almost certainly > change with it (and will certainly expand). > > +Aljoscha > > For streaming/batch execution separation, this is better served by > configuration in the runner's build (e.g. specifying two separate > executions in the pom.xml, one for streaming and one for batch). Given that > the tests live in a separate module from the runner, this is likened to how > RunnableOnService tests are currently executed by all of the runners. > > For sink, I think given the current implementations of sink there isn't a > huge need; however, most sinks should be annotated with some form of > superclass (although the implementation of sink requires side inputs, so > this is also worth considering). > > +jb > > These would live on the tests proper, yes. > > On Sun, Jun 12, 2016 at 11:05 PM, Jean-Baptiste Onofré > wrote: > > > Hi Thomas, > > > > it looks good to me. > > > > Just curious: the proposed annotations will be directly in the Java SDK > > Test jar right ? > > > > Thanks, > > Regards > > JB > > > > > > On 06/11/2016 01:34 AM, Thomas Groh wrote: > > > >> Hey Beamers! > >> > >> We have a lovely Capability Matrix ( > >> http://beam.incubator.apache.org/capability-matrix/) which describes > what > >> runners can do, and what's in the model. However, right now we only have > >> one way to specify that a test is useful to be executed in a runner, the > >> RunnableOnService category. > >> > >> I've worked on a document to expand the number of annotations to be more > >> in > >> line with the capability matrix, which should help runner writers test > >> more > >> precisely with regards to the Beam model. The document is located at > >> > >> > https://docs.google.com/document/d/1fICxq32t9yWn9qXhmT07xpclHeHX2VlUyVtpi2WzzGM/edit?usp=sharing > >> , > >> and I've added edit access for all of our committers. > >> > >> Feel free to take a look and leave any comments you may have, > >> > >> Thanks, > >> > >> Thomas > >> > >> > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >
Re: Apache Beam for Python
Woo hoo! On Tue, Jun 14, 2016 at 12:41 PM, Jean-Baptiste Onofré wrote: > Awesome ! Thanks ! > > Agree with Davor to create a feature branch. > > Regards > JB > > > On 06/14/2016 09:22 PM, Silviu Calinoiu wrote: >> >> Thanks everybody for the welcoming and feedback. The initial code move was >> proposed as pull request #461 [1]. >> >> Looking forward to working with everybody in the Beam community and >> especially any Pythonistas out there. >> >> Thanks, >> Silviu >> >> [1] https://github.com/apache/incubator-beam/pull/461 >> >> On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía wrote: >> >>> Excellent guys, Welcome to Beam ! >>> >>> I am looking for ways to integrate Beam with the standard notebook tools >>> (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK >>> arriving to Beam, Awesome. >>> >>> Ismaël Mejía >>> >>> On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela wrote: >>> Welcome Python people ;) I know a few people who've been waiting for this one! On Fri, Jun 3, 2016, 19:53 Davor Bonaci >>> >>> wrote: > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara! > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré > wrote: > >> Absolutely ;) >> >> >> On 06/03/2016 03:51 PM, James Malone wrote: >> >>> Hey Silviu! >>> >>> I think JB is proposing we create a python directory in the sdks > > directory >>> >>> in the root repository (and modify the configuration files accordingly): >>> >>> >>> https://github.com/apache/incubator-beam/tree/master/sdks >>> >>> This Beam document here titled "Apache Beam (Incubating): Repository >>> Structure" details the proposed repository structure and may be useful: >>> >>> >>> >>> >>> > >>> >>> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc >>> >>> >>> Best, >>> >>> James >>> >>> >>> >>> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu >>> >>> wrote: >>> >>> Hi JB, Thanks for the welcome! I come from the Python land so I am not quite familiar with Maven. What do you mean by a Maven module? You mean >>> >>> an artifact so you can install things? In Python, people are used to packages downloaded from PyPI (pypi.python.org -- which is sort of Maven >>> >>> for Python). Whatever is the standard way of doing things in Apache >>> >>> we'll > > do it. Just asking for clarifications. By the way this discussion is very useful since we will have to >>> >>> iron > > out several details like this. Thanks, Silviu On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré < j...@nanthrax.net> wrote: Hi Silviu, > > > thanks for detailed update and great work ! > > I would advice to create a: > > sdks/python > > Maven module to store the Python SDK. > > WDYT ? > > By the way, welcome aboard and great to have you all guys in the team > > ! > > > Regards > JB > > On 06/03/2016 03:13 PM, Silviu Calinoiu wrote: > > Hi all, >> >> >> My name is Silviu Calinoiu and I am a member of the Cloud >>> >>> Dataflow > > team >> >> working on the Python SDK. As the original Beam proposal ( >> https://wiki.apache.org/incubator/BeamProposal) mentioned, we >>> >>> have >> >> been >> planning to merge the Python SDK into Beam. The Python SDK is in >>> >>> an >> >> > early > stage of development (alpha milestone) and so this is a good time >>> >>> to >> >> > move > the code without causing too much disruption to our customers. >> >> Additionally, this enables the Beam community to contribute as >>> >>> soon > > as >> >> possible. >> >> The current state of the SDK is as follows: >> >> - >> >> Open-sourced at >> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/ >> >> >> - >> >> Model: All main concepts are present. >> - >> >> I/O: SDK supports text (Google Cloud Storage) and BigQuery >> > connectors > and has a framework for adding additional sources and sinks. >> >> - >> >> Runners: SDK has two pipeline runners: direct r
Re: Apache Beam for Python
Awesome ! Thanks ! Agree with Davor to create a feature branch. Regards JB On 06/14/2016 09:22 PM, Silviu Calinoiu wrote: Thanks everybody for the welcoming and feedback. The initial code move was proposed as pull request #461 [1]. Looking forward to working with everybody in the Beam community and especially any Pythonistas out there. Thanks, Silviu [1] https://github.com/apache/incubator-beam/pull/461 On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía wrote: Excellent guys, Welcome to Beam ! I am looking for ways to integrate Beam with the standard notebook tools (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK arriving to Beam, Awesome. Ismaël Mejía On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela wrote: Welcome Python people ;) I know a few people who've been waiting for this one! On Fri, Jun 3, 2016, 19:53 Davor Bonaci wrote: Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara! On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré wrote: Absolutely ;) On 06/03/2016 03:51 PM, James Malone wrote: Hey Silviu! I think JB is proposing we create a python directory in the sdks directory in the root repository (and modify the configuration files accordingly): https://github.com/apache/incubator-beam/tree/master/sdks This Beam document here titled "Apache Beam (Incubating): Repository Structure" details the proposed repository structure and may be useful: https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc Best, James On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu wrote: Hi JB, Thanks for the welcome! I come from the Python land so I am not quite familiar with Maven. What do you mean by a Maven module? You mean an artifact so you can install things? In Python, people are used to packages downloaded from PyPI (pypi.python.org -- which is sort of Maven for Python). Whatever is the standard way of doing things in Apache we'll do it. Just asking for clarifications. By the way this discussion is very useful since we will have to iron out several details like this. Thanks, Silviu On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré < j...@nanthrax.net> wrote: Hi Silviu, thanks for detailed update and great work ! I would advice to create a: sdks/python Maven module to store the Python SDK. WDYT ? By the way, welcome aboard and great to have you all guys in the team ! Regards JB On 06/03/2016 03:13 PM, Silviu Calinoiu wrote: Hi all, My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team working on the Python SDK. As the original Beam proposal ( https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been planning to merge the Python SDK into Beam. The Python SDK is in an early stage of development (alpha milestone) and so this is a good time to move the code without causing too much disruption to our customers. Additionally, this enables the Beam community to contribute as soon as possible. The current state of the SDK is as follows: - Open-sourced at https://github.com/GoogleCloudPlatform/DataflowPythonSDK/ - Model: All main concepts are present. - I/O: SDK supports text (Google Cloud Storage) and BigQuery connectors and has a framework for adding additional sources and sinks. - Runners: SDK has two pipeline runners: direct runner (in process, local execution) and Cloud Dataflow runner for batch pipelines (submit job to Google Dataflow service). The current direct runner is bounded only (batch execution) but there is work in progress to support unbounded (as in Java). - Testing: The code base has unit test coverage for all the modules and several integration and end to end tests (similar in coverage to the Java SDK). Streaming is not well tested end to end yet since Cloud Dataflow focused first on batch. - Docs: We have matching Python documentation for the features currently supported by Cloud Dataflow. The docs are on cloud.google.com (access only by whitelist due to the alpha stage of the project). Devin is working on the transition of all docs to Apache. In the next days/weeks we would like to prepare and start migrating the code and you should start seeing some pull requests. We also hope that the Beam community will shape the SDK going forward. In particular, all the model improvements implemented for Java (Runner API, etc.) will have equivalents in Python once they stabilize. If you have any advice before we start the journey please let us know. The team that will join the Beam effort consists of me (Silviu Calinoiu), Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least Robert Bradshaw (who is already an Apache Beam committer). So let us know what you think! Best regards, Sil
Re: Apache Beam for Python
Awesome job, Silviu! Really excited to have Python SDK join us in Beam. I'll take care of merging the pull request. Let's start with a feature branch, as per previous conversations on the dev@ list. On Tue, Jun 14, 2016 at 12:22 PM, Silviu Calinoiu < silv...@google.com.invalid> wrote: > Thanks everybody for the welcoming and feedback. The initial code move was > proposed as pull request #461 [1]. > > Looking forward to working with everybody in the Beam community and > especially any Pythonistas out there. > > Thanks, > Silviu > > [1] https://github.com/apache/incubator-beam/pull/461 > > On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía wrote: > > > Excellent guys, Welcome to Beam ! > > > > I am looking for ways to integrate Beam with the standard notebook tools > > (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK > > arriving to Beam, Awesome. > > > > Ismaël Mejía > > > > On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela wrote: > > > > > Welcome Python people ;) > > > > > > I know a few people who've been waiting for this one! > > > > > > On Fri, Jun 3, 2016, 19:53 Davor Bonaci > > wrote: > > > > > > > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara! > > > > > > > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré < > j...@nanthrax.net> > > > > wrote: > > > > > > > > > Absolutely ;) > > > > > > > > > > > > > > > On 06/03/2016 03:51 PM, James Malone wrote: > > > > > > > > > >> Hey Silviu! > > > > >> > > > > >> I think JB is proposing we create a python directory in the sdks > > > > directory > > > > >> in the root repository (and modify the configuration files > > > accordingly): > > > > >> > > > > >> https://github.com/apache/incubator-beam/tree/master/sdks > > > > >> > > > > >> This Beam document here titled "Apache Beam (Incubating): > Repository > > > > >> Structure" details the proposed repository structure and may be > > > useful: > > > > >> > > > > >> > > > > >> > > > > >> > > > > > > > > > > https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc > > > > >> > > > > >> Best, > > > > >> > > > > >> James > > > > >> > > > > >> > > > > >> > > > > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu > > > > >> > > > > >> wrote: > > > > >> > > > > >> Hi JB, > > > > >>> Thanks for the welcome! I come from the Python land so I am not > > > quite > > > > >>> familiar with Maven. What do you mean by a Maven module? You mean > > an > > > > >>> artifact so you can install things? In Python, people are used to > > > > >>> packages > > > > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven > > for > > > > >>> Python). Whatever is the standard way of doing things in Apache > > we'll > > > > do > > > > >>> it. Just asking for clarifications. > > > > >>> > > > > >>> By the way this discussion is very useful since we will have to > > iron > > > > out > > > > >>> several details like this. > > > > >>> Thanks, > > > > >>> Silviu > > > > >>> > > > > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré < > > > j...@nanthrax.net> > > > > >>> wrote: > > > > >>> > > > > >>> Hi Silviu, > > > > > > > > thanks for detailed update and great work ! > > > > > > > > I would advice to create a: > > > > > > > > sdks/python > > > > > > > > Maven module to store the Python SDK. > > > > > > > > WDYT ? > > > > > > > > By the way, welcome aboard and great to have you all guys in the > > > team > > > > ! > > > > > > > > Regards > > > > JB > > > > > > > > On 06/03/2016 03:13 PM, Silviu Calinoiu wrote: > > > > > > > > Hi all, > > > > > > > > > > My name is Silviu Calinoiu and I am a member of the Cloud > > Dataflow > > > > team > > > > > working on the Python SDK. As the original Beam proposal ( > > > > > https://wiki.apache.org/incubator/BeamProposal) mentioned, we > > have > > > > > been > > > > > planning to merge the Python SDK into Beam. The Python SDK is > in > > an > > > > > > > > > early > > > > >>> > > > > stage of development (alpha milestone) and so this is a good > time > > to > > > > > > > > > move > > > > >>> > > > > the code without causing too much disruption to our customers. > > > > > Additionally, this enables the Beam community to contribute as > > soon > > > > as > > > > > possible. > > > > > > > > > > The current state of the SDK is as follows: > > > > > > > > > > - > > > > > > > > > > Open-sourced at > > > > > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/ > > > > > > > > > > > > > > > - > > > > > > > > > > Model: All main concepts are present. > > > > > - > > > > > > > > > > I/O: SDK supports text (Google Cloud Storage) and BigQuery > > > > > > > > > connectors > > > > >>> > > > > and has a framework for adding additional sources and > sinks. > > > >
Re: Apache Beam for Python
Thanks everybody for the welcoming and feedback. The initial code move was proposed as pull request #461 [1]. Looking forward to working with everybody in the Beam community and especially any Pythonistas out there. Thanks, Silviu [1] https://github.com/apache/incubator-beam/pull/461 On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía wrote: > Excellent guys, Welcome to Beam ! > > I am looking for ways to integrate Beam with the standard notebook tools > (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK > arriving to Beam, Awesome. > > Ismaël Mejía > > On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela wrote: > > > Welcome Python people ;) > > > > I know a few people who've been waiting for this one! > > > > On Fri, Jun 3, 2016, 19:53 Davor Bonaci > wrote: > > > > > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara! > > > > > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré > > > wrote: > > > > > > > Absolutely ;) > > > > > > > > > > > > On 06/03/2016 03:51 PM, James Malone wrote: > > > > > > > >> Hey Silviu! > > > >> > > > >> I think JB is proposing we create a python directory in the sdks > > > directory > > > >> in the root repository (and modify the configuration files > > accordingly): > > > >> > > > >> https://github.com/apache/incubator-beam/tree/master/sdks > > > >> > > > >> This Beam document here titled "Apache Beam (Incubating): Repository > > > >> Structure" details the proposed repository structure and may be > > useful: > > > >> > > > >> > > > >> > > > >> > > > > > > https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc > > > >> > > > >> Best, > > > >> > > > >> James > > > >> > > > >> > > > >> > > > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu > > > >> > > > >> wrote: > > > >> > > > >> Hi JB, > > > >>> Thanks for the welcome! I come from the Python land so I am not > > quite > > > >>> familiar with Maven. What do you mean by a Maven module? You mean > an > > > >>> artifact so you can install things? In Python, people are used to > > > >>> packages > > > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven > for > > > >>> Python). Whatever is the standard way of doing things in Apache > we'll > > > do > > > >>> it. Just asking for clarifications. > > > >>> > > > >>> By the way this discussion is very useful since we will have to > iron > > > out > > > >>> several details like this. > > > >>> Thanks, > > > >>> Silviu > > > >>> > > > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré < > > j...@nanthrax.net> > > > >>> wrote: > > > >>> > > > >>> Hi Silviu, > > > > > > thanks for detailed update and great work ! > > > > > > I would advice to create a: > > > > > > sdks/python > > > > > > Maven module to store the Python SDK. > > > > > > WDYT ? > > > > > > By the way, welcome aboard and great to have you all guys in the > > team > > > ! > > > > > > Regards > > > JB > > > > > > On 06/03/2016 03:13 PM, Silviu Calinoiu wrote: > > > > > > Hi all, > > > > > > > > My name is Silviu Calinoiu and I am a member of the Cloud > Dataflow > > > team > > > > working on the Python SDK. As the original Beam proposal ( > > > > https://wiki.apache.org/incubator/BeamProposal) mentioned, we > have > > > > been > > > > planning to merge the Python SDK into Beam. The Python SDK is in > an > > > > > > > early > > > >>> > > > stage of development (alpha milestone) and so this is a good time > to > > > > > > > move > > > >>> > > > the code without causing too much disruption to our customers. > > > > Additionally, this enables the Beam community to contribute as > soon > > > as > > > > possible. > > > > > > > > The current state of the SDK is as follows: > > > > > > > > - > > > > > > > > Open-sourced at > > > > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/ > > > > > > > > > > > > - > > > > > > > > Model: All main concepts are present. > > > > - > > > > > > > > I/O: SDK supports text (Google Cloud Storage) and BigQuery > > > > > > > connectors > > > >>> > > > and has a framework for adding additional sources and sinks. > > > > - > > > > > > > > Runners: SDK has two pipeline runners: direct runner (in > > > process, > > > > local > > > > execution) and Cloud Dataflow runner for batch pipelines > > (submit > > > > job > > > > to > > > > Google Dataflow service). The current direct runner is > bounded > > > > only > > > > (batch > > > > execution) but there is work in progress to support > unbounded > > > (as > > > > in > > > > Java). > > > > - > > > > > > > > Testing: The code base has unit test coverage for all the > > > modules > > > > > > > and >
Re: Testing and the Capability Matrix
It is also worth noting that this document is a snapshot rather than the long-term plan. As the SDK evolves, the annotations will almost certainly change with it (and will certainly expand). +Aljoscha For streaming/batch execution separation, this is better served by configuration in the runner's build (e.g. specifying two separate executions in the pom.xml, one for streaming and one for batch). Given that the tests live in a separate module from the runner, this is likened to how RunnableOnService tests are currently executed by all of the runners. For sink, I think given the current implementations of sink there isn't a huge need; however, most sinks should be annotated with some form of superclass (although the implementation of sink requires side inputs, so this is also worth considering). +jb These would live on the tests proper, yes. On Sun, Jun 12, 2016 at 11:05 PM, Jean-Baptiste Onofré wrote: > Hi Thomas, > > it looks good to me. > > Just curious: the proposed annotations will be directly in the Java SDK > Test jar right ? > > Thanks, > Regards > JB > > > On 06/11/2016 01:34 AM, Thomas Groh wrote: > >> Hey Beamers! >> >> We have a lovely Capability Matrix ( >> http://beam.incubator.apache.org/capability-matrix/) which describes what >> runners can do, and what's in the model. However, right now we only have >> one way to specify that a test is useful to be executed in a runner, the >> RunnableOnService category. >> >> I've worked on a document to expand the number of annotations to be more >> in >> line with the capability matrix, which should help runner writers test >> more >> precisely with regards to the Beam model. The document is located at >> >> https://docs.google.com/document/d/1fICxq32t9yWn9qXhmT07xpclHeHX2VlUyVtpi2WzzGM/edit?usp=sharing >> , >> and I've added edit access for all of our committers. >> >> Feel free to take a look and leave any comments you may have, >> >> Thanks, >> >> Thomas >> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >
Re: newbie question about beam
Hi Sergio, Welcome aboard, and good to discuss with you during ApacheCon. Distribution of the resources is a point related to runner, and more specifically to the execution environment of the runner. Each runner/backend will implement their own logic. I don't know Keras enough to provide a strong advice. Regarding the Python SDK, we discussed about that last week: it's on the way. We should have the Python SDK very soon (we were busy with the first release). Regards JB On 06/14/2016 12:38 PM, Sergio Fernández wrote: Hi guys, I'm newbie in the Beam community, but as someone who has used DataFlow in the past I've been following the podling since you came to ASK. I'm very happy to see that 0.1.0-incubating is finally going out, congratulations for such great milestone. I discussed with some of you guys in the last ApacheCon, and for me was good to know the Python SDK was just a matter of time and should come to Beam at some point. So coming back to the original plans < http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html>, do you manage any timeline to bring the Python SDK to Beam? So I'd like to bring a question how Beam plans to deal with the distribution of resources across all nodes, something I know it not really clean with some runners (e.g., Spark). More concretely, we're using Keras < http://keras.io/>, a deep learning Python library that is capable of running on top of either TensorFlow or Theano. Historically I know DataFlow and TensorFlow are not very compatible. But I wonder if the project has already discussed how to support running Keras (TensorFlow) tasks on Beam. For us is more for querying than for training, so I'd like to know if the Beam Model could natively support the distribution of the models (sometimes several GB). Thanks in advance. Cheers, -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
newbie question about beam
Hi guys, I'm newbie in the Beam community, but as someone who has used DataFlow in the past I've been following the podling since you came to ASK. I'm very happy to see that 0.1.0-incubating is finally going out, congratulations for such great milestone. I discussed with some of you guys in the last ApacheCon, and for me was good to know the Python SDK was just a matter of time and should come to Beam at some point. So coming back to the original plans < http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html>, do you manage any timeline to bring the Python SDK to Beam? So I'd like to bring a question how Beam plans to deal with the distribution of resources across all nodes, something I know it not really clean with some runners (e.g., Spark). More concretely, we're using Keras < http://keras.io/>, a deep learning Python library that is capable of running on top of either TensorFlow or Theano. Historically I know DataFlow and TensorFlow are not very compatible. But I wonder if the project has already discussed how to support running Keras (TensorFlow) tasks on Beam. For us is more for querying than for training, so I'd like to know if the Beam Model could natively support the distribution of the models (sometimes several GB). Thanks in advance. Cheers, -- Sergio Fernández Partner Technology Manager Redlink GmbH m: +43 6602747925 e: sergio.fernan...@redlink.co w: http://redlink.co