Talking About Beam

2016-06-14 Thread Jesse Anderson
I wrote a piece published on O'Reilly about Beam
https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code?utm_medium=social&utm_source=twitter.com&utm_campaign=lgen&utm_content=data+article+ki&cmp=tw-data-na-article-lgen_tw_article.
It gives some of the thoughts and ideas that will help Beam adoption. I
suggest reading it to get some ideas for how to talk about Beam at talks
and conferences.

Before writing the piece, I tested how it resonates with people. These
really help people understand why Beam is used and how it solves the future
proofing and scale proofing problems small companies face.

Thanks,

Jesse


Re: newbie question about beam

2016-06-14 Thread Davor Bonaci
Hi Sergio,
It was great talking with you in Vancouver.

As of today, the Python SDK is here, [1], [2]. Wasn't that fast enough ;)

Davor

[1] https://github.com/apache/incubator-beam/pull/461
[2] https://github.com/apache/incubator-beam/tree/python-sdk/sdks/python

On Tue, Jun 14, 2016 at 3:45 AM, Jean-Baptiste Onofré 
wrote:

> Hi Sergio,
>
> Welcome aboard, and good to discuss with you during ApacheCon.
>
> Distribution of the resources is a point related to runner, and more
> specifically to the execution environment of the runner. Each
> runner/backend will implement their own logic.
>
> I don't know Keras enough to provide a strong advice.
>
> Regarding the Python SDK, we discussed about that last week: it's on the
> way. We should have the Python SDK very soon (we were busy with the first
> release).
>
> Regards
> JB
>
>
> On 06/14/2016 12:38 PM, Sergio Fernández wrote:
>
>> Hi guys,
>>
>> I'm newbie in the Beam community, but as someone who has used DataFlow in
>> the past I've been following the podling since you came to ASK. I'm very
>> happy to see that 0.1.0-incubating is finally going out, congratulations
>> for such great milestone.
>>
>> I discussed with some of you guys in the last ApacheCon, and for me was
>> good to know the Python SDK was just a matter of time and should come to
>> Beam at some point. So coming back to the original plans <
>>
>> http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>> >,
>> do you manage any timeline to bring the Python SDK to Beam?
>>
>> So I'd like to bring a question how Beam plans to deal with the
>> distribution of resources across all nodes, something I know it not really
>> clean with some runners (e.g., Spark). More concretely, we're using Keras
>> <
>> http://keras.io/>, a deep learning Python library that is capable of
>> running on top of either TensorFlow or Theano. Historically I know
>> DataFlow
>> and TensorFlow are not very compatible. But I wonder if the project has
>> already discussed how to support running Keras (TensorFlow) tasks on Beam.
>> For us is more for querying than for training, so I'd like to know if the
>> Beam Model could natively support the distribution of the models
>> (sometimes
>> several GB).
>>
>> Thanks in advance.
>>
>> Cheers,
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [RESULT] [VOTE] Release version 0.1.0-incubating

2016-06-14 Thread Davor Bonaci
The Apache Incubator has unanimously approved this release, with 6
approving and binding votes.

We are now proceeding with the final steps of the release.

On Sun, Jun 12, 2016 at 2:33 PM, Ismaël Mejía  wrote:

> Congratulations Davor, you, JB and all the team have made a great job. I am
> really happy to see this release going out !
>
> And remember they used to say that the first apache release is the hardest
> one, so from now on it should be easier :)
>
>
> On Sun, Jun 12, 2016 at 8:23 AM, Jesse Anderson 
> wrote:
>
> > Congrats on the first release!
> >
> > On Sun, Jun 12, 2016, 7:50 AM Davor Bonaci 
> > wrote:
> >
> > > I'm happy to announce that we have unanimously approved this release.
> > >
> > > There are 10 approving votes, 9 of which are binding:
> > > * Davor Bonaci
> > > * Robert Bradshaw
> > > * Ben Chambers
> > > * Dan Halperin
> > > * Kenneth Knowles
> > > * Aljoscha Krettek
> > > * James Malone
> > > * Jean-Baptiste Onofré
> > > * Amit Sela
> > > * Scott Wegner
> > >
> > > There are no disapproving votes.
> > >
> > > At this point, this proposal will be presented to the Apache Incubator
> > for
> > > their review.
> > >
> > > Thanks everyone! Personally, I'm super excited to see our first release
> > > getting so close!
> > >
> > > Davor
> > >
> > > -- Forwarded message --
> > > From: Davor Bonaci 
> > > Date: Wed, Jun 8, 2016 at 4:20 PM
> > > Subject: [VOTE] Release version 0.1.0-incubating
> > > To: dev@beam.incubator.apache.org
> > >
> > >
> > > Hi everyone,
> > > Here's the first vote for the first release of Apache Beam -- version
> > > 0.1.0-incubating!
> > >
> > > As a reminder, we aren't looking for any specific new functionality,
> but
> > > would like to release the existing code, get something to our users'
> > hands,
> > > and test the processes. Previous discussions and iterations on this
> > release
> > > have been archived on the dev@ mailing list.
> > >
> > > The complete staging area is available for your review, which includes:
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [1],
> > > and
> > > * all artifacts to be deployed to the Maven Central Repository [2].
> > >
> > > This corresponds to the tag "v0.1.0-incubating-RC3" in source control,
> > [3].
> > >
> > > Please vote as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > For those of us enjoying our first voting experience -- the release
> > > checklist is here [4]. This is a "package release"-type of the Apache
> > > voting process [5]. As customary, the vote will be open for 72 hours.
> It
> > is
> > > adopted by majority approval with at least 3 PPMC affirmative votes. If
> > > approved, the proposal will be presented to the Apache Incubator for
> > their
> > > review.
> > >
> > > Thanks,
> > > Davor
> > >
> > > [1]
> > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachebeam-1002/org/apache/beam/beam-parent/0.1.0-incubating/beam-parent-0.1.0-incubating-source-release.zip
> > > [2]
> > https://repository.apache.org/content/repositories/orgapachebeam-1002/
> > > [3]
> https://github.com/apache/incubator-beam/tree/v0.1.0-incubating-RC3
> > > [4]
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> > > [5] http://www.apache.org/foundation/voting.html
> > >
> >
>


Re: Testing and the Capability Matrix

2016-06-14 Thread Aljoscha Krettek
@Thomas Completely agree, this is also how it is currently handled in the
Flink runner. I was talking about the presentation of the compatibility
matrix on the web site, whether we should have separate columns for Flink
Stream/Batch and Spark Stream/Batch. (And possibly other runners in the
future)

On Tue, 14 Jun 2016 at 18:57 Thomas Groh  wrote:

> It is also worth noting that this document is a snapshot rather than the
> long-term plan. As the SDK evolves, the annotations will almost certainly
> change with it (and will certainly expand).
>
> +Aljoscha
>
> For streaming/batch execution separation, this is better served by
> configuration in the runner's build (e.g. specifying two separate
> executions in the pom.xml, one for streaming and one for batch). Given that
> the tests live in a separate module from the runner, this is likened to how
> RunnableOnService tests are currently executed by all of the runners.
>
> For sink, I think given the current implementations of sink there isn't a
> huge need; however, most sinks should be annotated with some form of
> superclass (although the implementation of sink requires side inputs, so
> this is also worth considering).
>
> +jb
>
> These would live on the tests proper, yes.
>
> On Sun, Jun 12, 2016 at 11:05 PM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi Thomas,
> >
> > it looks good to me.
> >
> > Just curious: the proposed annotations will be directly in the Java SDK
> > Test jar right ?
> >
> > Thanks,
> > Regards
> > JB
> >
> >
> > On 06/11/2016 01:34 AM, Thomas Groh wrote:
> >
> >> Hey Beamers!
> >>
> >> We have a lovely Capability Matrix (
> >> http://beam.incubator.apache.org/capability-matrix/) which describes
> what
> >> runners can do, and what's in the model. However, right now we only have
> >> one way to specify that a test is useful to be executed in a runner, the
> >> RunnableOnService category.
> >>
> >> I've worked on a document to expand the number of annotations to be more
> >> in
> >> line with the capability matrix, which should help runner writers test
> >> more
> >> precisely with regards to the Beam model. The document is located at
> >>
> >>
> https://docs.google.com/document/d/1fICxq32t9yWn9qXhmT07xpclHeHX2VlUyVtpi2WzzGM/edit?usp=sharing
> >> ,
> >> and I've added edit access for all of our committers.
> >>
> >> Feel free to take a look and leave any comments you may have,
> >>
> >> Thanks,
> >>
> >> Thomas
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: Apache Beam for Python

2016-06-14 Thread Robert Bradshaw
Woo hoo!

On Tue, Jun 14, 2016 at 12:41 PM, Jean-Baptiste Onofré  
wrote:
> Awesome ! Thanks !
>
> Agree with Davor to create a feature branch.
>
> Regards
> JB
>
>
> On 06/14/2016 09:22 PM, Silviu Calinoiu wrote:
>>
>> Thanks everybody for the welcoming and feedback. The initial code move was
>> proposed as pull request #461 [1].
>>
>> Looking forward to working with everybody in the Beam community and
>> especially any Pythonistas out there.
>>
>> Thanks,
>> Silviu
>>
>> [1] https://github.com/apache/incubator-beam/pull/461
>>
>> On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía  wrote:
>>
>>> Excellent guys, Welcome to Beam !
>>>
>>> I am looking for ways to integrate Beam with the standard notebook tools
>>> (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK
>>> arriving to Beam, Awesome.
>>>
>>> Ismaël Mejía
>>>
>>> On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela  wrote:
>>>
 Welcome Python people ;)

 I know a few people who've been waiting for this one!

 On Fri, Jun 3, 2016, 19:53 Davor Bonaci 
>>>
>>> wrote:


> Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
>
> On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré 
> wrote:
>
>> Absolutely ;)
>>
>>
>> On 06/03/2016 03:51 PM, James Malone wrote:
>>
>>> Hey Silviu!
>>>
>>> I think JB is proposing we create a python directory in the sdks
>
> directory
>>>
>>> in the root repository (and modify the configuration files

 accordingly):
>>>
>>>
>>>  https://github.com/apache/incubator-beam/tree/master/sdks
>>>
>>> This Beam document here titled "Apache Beam (Incubating): Repository
>>> Structure" details the proposed repository structure and may be

 useful:
>>>
>>>
>>>
>>>
>>>
>

>>>
>>> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>>>
>>>
>>> Best,
>>>
>>> James
>>>
>>>
>>>
>>> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
>>> 
>>> wrote:
>>>
>>> Hi JB,

 Thanks for the welcome! I come from the Python land so  I am not

 quite

 familiar with Maven. What do you mean by a Maven module? You mean
>>>
>>> an

 artifact so you can install things? In Python, people are used to
 packages
 downloaded from PyPI (pypi.python.org -- which is sort of Maven
>>>
>>> for

 Python). Whatever is the standard way of doing things in Apache
>>>
>>> we'll
>
> do

 it. Just asking for clarifications.

 By the way this discussion is very useful since we will have to
>>>
>>> iron
>
> out

 several details like this.
 Thanks,
 Silviu

 On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <

 j...@nanthrax.net>

 wrote:

 Hi Silviu,
>
>
> thanks for detailed update and great work !
>
> I would advice to create a:
>
> sdks/python
>
> Maven module to store the Python SDK.
>
> WDYT ?
>
> By the way, welcome aboard and great to have you all guys in the

 team
>
> !
>
>
> Regards
> JB
>
> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
>
> Hi all,
>>
>>
>> My name is Silviu Calinoiu and I am a member of the Cloud
>>>
>>> Dataflow
>
> team
>>
>> working on the Python SDK.  As the original Beam proposal (
>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we
>>>
>>> have
>>
>> been
>> planning to merge the Python SDK into Beam. The Python SDK is in
>>>
>>> an
>>
>>
> early


> stage of development (alpha milestone) and so this is a good time
>>>
>>> to
>>
>>
> move


> the code without causing too much disruption to our customers.
>>
>> Additionally, this enables the Beam community to contribute as
>>>
>>> soon
>
> as
>>
>> possible.
>>
>> The current state of the SDK is as follows:
>>
>>   -
>>
>>   Open-sourced at
>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>>
>>
>>   -
>>
>>   Model: All main concepts are present.
>>   -
>>
>>   I/O: SDK supports text (Google Cloud Storage) and BigQuery
>>
> connectors


>   and has a framework for adding additional sources and sinks.
>>
>>   -
>>
>>   Runners: SDK has two pipeline runners: direct r

Re: Apache Beam for Python

2016-06-14 Thread Jean-Baptiste Onofré

Awesome ! Thanks !

Agree with Davor to create a feature branch.

Regards
JB

On 06/14/2016 09:22 PM, Silviu Calinoiu wrote:

Thanks everybody for the welcoming and feedback. The initial code move was
proposed as pull request #461 [1].

Looking forward to working with everybody in the Beam community and
especially any Pythonistas out there.

Thanks,
Silviu

[1] https://github.com/apache/incubator-beam/pull/461

On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía  wrote:


Excellent guys, Welcome to Beam !

I am looking for ways to integrate Beam with the standard notebook tools
(Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK
arriving to Beam, Awesome.

Ismaël Mejía

On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela  wrote:


Welcome Python people ;)

I know a few people who've been waiting for this one!

On Fri, Jun 3, 2016, 19:53 Davor Bonaci 

wrote:



Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!

On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré 
wrote:


Absolutely ;)


On 06/03/2016 03:51 PM, James Malone wrote:


Hey Silviu!

I think JB is proposing we create a python directory in the sdks

directory

in the root repository (and modify the configuration files

accordingly):


 https://github.com/apache/incubator-beam/tree/master/sdks

This Beam document here titled "Apache Beam (Incubating): Repository
Structure" details the proposed repository structure and may be

useful:










https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc


Best,

James



On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu

wrote:

Hi JB,

Thanks for the welcome! I come from the Python land so  I am not

quite

familiar with Maven. What do you mean by a Maven module? You mean

an

artifact so you can install things? In Python, people are used to
packages
downloaded from PyPI (pypi.python.org -- which is sort of Maven

for

Python). Whatever is the standard way of doing things in Apache

we'll

do

it. Just asking for clarifications.

By the way this discussion is very useful since we will have to

iron

out

several details like this.
Thanks,
Silviu

On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <

j...@nanthrax.net>

wrote:

Hi Silviu,


thanks for detailed update and great work !

I would advice to create a:

sdks/python

Maven module to store the Python SDK.

WDYT ?

By the way, welcome aboard and great to have you all guys in the

team

!


Regards
JB

On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:

Hi all,


My name is Silviu Calinoiu and I am a member of the Cloud

Dataflow

team

working on the Python SDK.  As the original Beam proposal (
https://wiki.apache.org/incubator/BeamProposal) mentioned, we

have

been
planning to merge the Python SDK into Beam. The Python SDK is in

an



early



stage of development (alpha milestone) and so this is a good time

to



move



the code without causing too much disruption to our customers.

Additionally, this enables the Beam community to contribute as

soon

as

possible.

The current state of the SDK is as follows:

  -

  Open-sourced at
https://github.com/GoogleCloudPlatform/DataflowPythonSDK/


  -

  Model: All main concepts are present.
  -

  I/O: SDK supports text (Google Cloud Storage) and BigQuery


connectors



  and has a framework for adding additional sources and sinks.

  -

  Runners: SDK has two pipeline runners: direct runner (in

process,

local
  execution) and Cloud Dataflow runner for batch pipelines

(submit

job
to
  Google Dataflow service). The current direct runner is

bounded

only
(batch
  execution) but there is work in progress to support

unbounded

(as

in
Java).
  -

  Testing: The code base has unit test coverage for all the

modules



and



  several integration and end to end tests (similar in coverage

to

the
Java
  SDK). Streaming is not well tested end to end yet since

Cloud



Dataflow



  focused first on batch.

  -

  Docs: We have matching Python documentation for the features


currently



  supported by Cloud Dataflow. The docs are on

cloud.google.com



(access



  only by whitelist due to the alpha stage of the project).

Devin

is

working
  on the transition of all docs to Apache.


In the next days/weeks we would like to prepare and start

migrating

the

code and you should start seeing some pull requests. We also hope

that



the



Beam community will shape the SDK going forward. In particular,

all

the

model improvements implemented for Java (Runner API, etc.) will

have

equivalents in Python once they stabilize. If you have any advice
before
we
start the journey please let us know.

The team that will join the Beam effort consists of me (Silviu


Calinoiu),



Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not

least

Robert Bradshaw (who is already an Apache Beam committer).

So let us know what you think!

Best regards,

Sil

Re: Apache Beam for Python

2016-06-14 Thread Davor Bonaci
Awesome job, Silviu! Really excited to have Python SDK join us in Beam.

I'll take care of merging the pull request. Let's start with a feature
branch, as per previous conversations on the dev@ list.

On Tue, Jun 14, 2016 at 12:22 PM, Silviu Calinoiu <
silv...@google.com.invalid> wrote:

> Thanks everybody for the welcoming and feedback. The initial code move was
> proposed as pull request #461 [1].
>
> Looking forward to working with everybody in the Beam community and
> especially any Pythonistas out there.
>
> Thanks,
> Silviu
>
> [1] https://github.com/apache/incubator-beam/pull/461
>
> On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía  wrote:
>
> > Excellent guys, Welcome to Beam !
> >
> > I am looking for ways to integrate Beam with the standard notebook tools
> > (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK
> > arriving to Beam, Awesome.
> >
> > Ismaël Mejía
> >
> > On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela  wrote:
> >
> > > Welcome Python people ;)
> > >
> > > I know a few people who've been waiting for this one!
> > >
> > > On Fri, Jun 3, 2016, 19:53 Davor Bonaci 
> > wrote:
> > >
> > > > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
> > > >
> > > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > > > wrote:
> > > >
> > > > > Absolutely ;)
> > > > >
> > > > >
> > > > > On 06/03/2016 03:51 PM, James Malone wrote:
> > > > >
> > > > >> Hey Silviu!
> > > > >>
> > > > >> I think JB is proposing we create a python directory in the sdks
> > > > directory
> > > > >> in the root repository (and modify the configuration files
> > > accordingly):
> > > > >>
> > > > >> https://github.com/apache/incubator-beam/tree/master/sdks
> > > > >>
> > > > >> This Beam document here titled "Apache Beam (Incubating):
> Repository
> > > > >> Structure" details the proposed repository structure and may be
> > > useful:
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
> > > > >>
> > > > >> Best,
> > > > >>
> > > > >> James
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
> > > > >> 
> > > > >> wrote:
> > > > >>
> > > > >> Hi JB,
> > > > >>> Thanks for the welcome! I come from the Python land so  I am not
> > > quite
> > > > >>> familiar with Maven. What do you mean by a Maven module? You mean
> > an
> > > > >>> artifact so you can install things? In Python, people are used to
> > > > >>> packages
> > > > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven
> > for
> > > > >>> Python). Whatever is the standard way of doing things in Apache
> > we'll
> > > > do
> > > > >>> it. Just asking for clarifications.
> > > > >>>
> > > > >>> By the way this discussion is very useful since we will have to
> > iron
> > > > out
> > > > >>> several details like this.
> > > > >>> Thanks,
> > > > >>> Silviu
> > > > >>>
> > > > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <
> > > j...@nanthrax.net>
> > > > >>> wrote:
> > > > >>>
> > > > >>> Hi Silviu,
> > > > 
> > > >  thanks for detailed update and great work !
> > > > 
> > > >  I would advice to create a:
> > > > 
> > > >  sdks/python
> > > > 
> > > >  Maven module to store the Python SDK.
> > > > 
> > > >  WDYT ?
> > > > 
> > > >  By the way, welcome aboard and great to have you all guys in the
> > > team
> > > > !
> > > > 
> > > >  Regards
> > > >  JB
> > > > 
> > > >  On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
> > > > 
> > > >  Hi all,
> > > > >
> > > > > My name is Silviu Calinoiu and I am a member of the Cloud
> > Dataflow
> > > > team
> > > > > working on the Python SDK.  As the original Beam proposal (
> > > > > https://wiki.apache.org/incubator/BeamProposal) mentioned, we
> > have
> > > > > been
> > > > > planning to merge the Python SDK into Beam. The Python SDK is
> in
> > an
> > > > >
> > > >  early
> > > > >>>
> > > >  stage of development (alpha milestone) and so this is a good
> time
> > to
> > > > >
> > > >  move
> > > > >>>
> > > >  the code without causing too much disruption to our customers.
> > > > > Additionally, this enables the Beam community to contribute as
> > soon
> > > > as
> > > > > possible.
> > > > >
> > > > > The current state of the SDK is as follows:
> > > > >
> > > > >  -
> > > > >
> > > > >  Open-sourced at
> > > > > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
> > > > >
> > > > >
> > > > >  -
> > > > >
> > > > >  Model: All main concepts are present.
> > > > >  -
> > > > >
> > > > >  I/O: SDK supports text (Google Cloud Storage) and BigQuery
> > > > >
> > > >  connectors
> > > > >>>
> > > >   and has a framework for adding additional sources and
> sinks.
> > > > 

Re: Apache Beam for Python

2016-06-14 Thread Silviu Calinoiu
Thanks everybody for the welcoming and feedback. The initial code move was
proposed as pull request #461 [1].

Looking forward to working with everybody in the Beam community and
especially any Pythonistas out there.

Thanks,
Silviu

[1] https://github.com/apache/incubator-beam/pull/461

On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía  wrote:

> Excellent guys, Welcome to Beam !
>
> I am looking for ways to integrate Beam with the standard notebook tools
> (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK
> arriving to Beam, Awesome.
>
> Ismaël Mejía
>
> On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela  wrote:
>
> > Welcome Python people ;)
> >
> > I know a few people who've been waiting for this one!
> >
> > On Fri, Jun 3, 2016, 19:53 Davor Bonaci 
> wrote:
> >
> > > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
> > >
> > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré 
> > > wrote:
> > >
> > > > Absolutely ;)
> > > >
> > > >
> > > > On 06/03/2016 03:51 PM, James Malone wrote:
> > > >
> > > >> Hey Silviu!
> > > >>
> > > >> I think JB is proposing we create a python directory in the sdks
> > > directory
> > > >> in the root repository (and modify the configuration files
> > accordingly):
> > > >>
> > > >> https://github.com/apache/incubator-beam/tree/master/sdks
> > > >>
> > > >> This Beam document here titled "Apache Beam (Incubating): Repository
> > > >> Structure" details the proposed repository structure and may be
> > useful:
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
> > > >>
> > > >> Best,
> > > >>
> > > >> James
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
> > > >> 
> > > >> wrote:
> > > >>
> > > >> Hi JB,
> > > >>> Thanks for the welcome! I come from the Python land so  I am not
> > quite
> > > >>> familiar with Maven. What do you mean by a Maven module? You mean
> an
> > > >>> artifact so you can install things? In Python, people are used to
> > > >>> packages
> > > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven
> for
> > > >>> Python). Whatever is the standard way of doing things in Apache
> we'll
> > > do
> > > >>> it. Just asking for clarifications.
> > > >>>
> > > >>> By the way this discussion is very useful since we will have to
> iron
> > > out
> > > >>> several details like this.
> > > >>> Thanks,
> > > >>> Silviu
> > > >>>
> > > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <
> > j...@nanthrax.net>
> > > >>> wrote:
> > > >>>
> > > >>> Hi Silviu,
> > > 
> > >  thanks for detailed update and great work !
> > > 
> > >  I would advice to create a:
> > > 
> > >  sdks/python
> > > 
> > >  Maven module to store the Python SDK.
> > > 
> > >  WDYT ?
> > > 
> > >  By the way, welcome aboard and great to have you all guys in the
> > team
> > > !
> > > 
> > >  Regards
> > >  JB
> > > 
> > >  On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
> > > 
> > >  Hi all,
> > > >
> > > > My name is Silviu Calinoiu and I am a member of the Cloud
> Dataflow
> > > team
> > > > working on the Python SDK.  As the original Beam proposal (
> > > > https://wiki.apache.org/incubator/BeamProposal) mentioned, we
> have
> > > > been
> > > > planning to merge the Python SDK into Beam. The Python SDK is in
> an
> > > >
> > >  early
> > > >>>
> > >  stage of development (alpha milestone) and so this is a good time
> to
> > > >
> > >  move
> > > >>>
> > >  the code without causing too much disruption to our customers.
> > > > Additionally, this enables the Beam community to contribute as
> soon
> > > as
> > > > possible.
> > > >
> > > > The current state of the SDK is as follows:
> > > >
> > > >  -
> > > >
> > > >  Open-sourced at
> > > > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
> > > >
> > > >
> > > >  -
> > > >
> > > >  Model: All main concepts are present.
> > > >  -
> > > >
> > > >  I/O: SDK supports text (Google Cloud Storage) and BigQuery
> > > >
> > >  connectors
> > > >>>
> > >   and has a framework for adding additional sources and sinks.
> > > >  -
> > > >
> > > >  Runners: SDK has two pipeline runners: direct runner (in
> > > process,
> > > > local
> > > >  execution) and Cloud Dataflow runner for batch pipelines
> > (submit
> > > > job
> > > > to
> > > >  Google Dataflow service). The current direct runner is
> bounded
> > > > only
> > > > (batch
> > > >  execution) but there is work in progress to support
> unbounded
> > > (as
> > > > in
> > > > Java).
> > > >  -
> > > >
> > > >  Testing: The code base has unit test coverage for all the
> > > modules
> > > >
> > >  and
>

Re: Testing and the Capability Matrix

2016-06-14 Thread Thomas Groh
It is also worth noting that this document is a snapshot rather than the
long-term plan. As the SDK evolves, the annotations will almost certainly
change with it (and will certainly expand).

+Aljoscha

For streaming/batch execution separation, this is better served by
configuration in the runner's build (e.g. specifying two separate
executions in the pom.xml, one for streaming and one for batch). Given that
the tests live in a separate module from the runner, this is likened to how
RunnableOnService tests are currently executed by all of the runners.

For sink, I think given the current implementations of sink there isn't a
huge need; however, most sinks should be annotated with some form of
superclass (although the implementation of sink requires side inputs, so
this is also worth considering).

+jb

These would live on the tests proper, yes.

On Sun, Jun 12, 2016 at 11:05 PM, Jean-Baptiste Onofré 
wrote:

> Hi Thomas,
>
> it looks good to me.
>
> Just curious: the proposed annotations will be directly in the Java SDK
> Test jar right ?
>
> Thanks,
> Regards
> JB
>
>
> On 06/11/2016 01:34 AM, Thomas Groh wrote:
>
>> Hey Beamers!
>>
>> We have a lovely Capability Matrix (
>> http://beam.incubator.apache.org/capability-matrix/) which describes what
>> runners can do, and what's in the model. However, right now we only have
>> one way to specify that a test is useful to be executed in a runner, the
>> RunnableOnService category.
>>
>> I've worked on a document to expand the number of annotations to be more
>> in
>> line with the capability matrix, which should help runner writers test
>> more
>> precisely with regards to the Beam model. The document is located at
>>
>> https://docs.google.com/document/d/1fICxq32t9yWn9qXhmT07xpclHeHX2VlUyVtpi2WzzGM/edit?usp=sharing
>> ,
>> and I've added edit access for all of our committers.
>>
>> Feel free to take a look and leave any comments you may have,
>>
>> Thanks,
>>
>> Thomas
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: newbie question about beam

2016-06-14 Thread Jean-Baptiste Onofré

Hi Sergio,

Welcome aboard, and good to discuss with you during ApacheCon.

Distribution of the resources is a point related to runner, and more 
specifically to the execution environment of the runner. Each 
runner/backend will implement their own logic.


I don't know Keras enough to provide a strong advice.

Regarding the Python SDK, we discussed about that last week: it's on the 
way. We should have the Python SDK very soon (we were busy with the 
first release).


Regards
JB

On 06/14/2016 12:38 PM, Sergio Fernández wrote:

Hi guys,

I'm newbie in the Beam community, but as someone who has used DataFlow in
the past I've been following the podling since you came to ASK. I'm very
happy to see that 0.1.0-incubating is finally going out, congratulations
for such great milestone.

I discussed with some of you guys in the last ApacheCon, and for me was
good to know the Python SDK was just a matter of time and should come to
Beam at some point. So coming back to the original plans <
http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html>,
do you manage any timeline to bring the Python SDK to Beam?

So I'd like to bring a question how Beam plans to deal with the
distribution of resources across all nodes, something I know it not really
clean with some runners (e.g., Spark). More concretely, we're using Keras <
http://keras.io/>, a deep learning Python library that is capable of
running on top of either TensorFlow or Theano. Historically I know DataFlow
and TensorFlow are not very compatible. But I wonder if the project has
already discussed how to support running Keras (TensorFlow) tasks on Beam.
For us is more for querying than for training, so I'd like to know if the
Beam Model could natively support the distribution of the models (sometimes
several GB).

Thanks in advance.

Cheers,



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


newbie question about beam

2016-06-14 Thread Sergio Fernández
Hi guys,

I'm newbie in the Beam community, but as someone who has used DataFlow in
the past I've been following the podling since you came to ASK. I'm very
happy to see that 0.1.0-incubating is finally going out, congratulations
for such great milestone.

I discussed with some of you guys in the last ApacheCon, and for me was
good to know the Python SDK was just a matter of time and should come to
Beam at some point. So coming back to the original plans <
http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html>,
do you manage any timeline to bring the Python SDK to Beam?

So I'd like to bring a question how Beam plans to deal with the
distribution of resources across all nodes, something I know it not really
clean with some runners (e.g., Spark). More concretely, we're using Keras <
http://keras.io/>, a deep learning Python library that is capable of
running on top of either TensorFlow or Theano. Historically I know DataFlow
and TensorFlow are not very compatible. But I wonder if the project has
already discussed how to support running Keras (TensorFlow) tasks on Beam.
For us is more for querying than for training, so I'd like to know if the
Beam Model could natively support the distribution of the models (sometimes
several GB).

Thanks in advance.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co