PPMC

2016-02-04 Thread Tyler Akidau
Hello Beamers!

To summarize a discussion that started while infrastructure was being set
up:

   - Google folks proposed a PPMC composed of a subset of the initial
   committers.
   - Multiple folks pointed out that the rules say the PPMC is to be
   initial committers + mentors.
   - Multiple folks also noted they would be fine with the more limited
   PPMC.
   - Google folks agreed a PPMC of initial committers + mentors is what the
   rules state, so we should go with that.

Please feel free to jump in and correct anything I've misrepresented in my
summary, or if you feel there's anything further to discuss. Given that
we're simply going with what is stated in the rules, seems there is no need
for a vote of any kind.

-Tyler


List Archives

2016-02-04 Thread Tyler Akidau
Anyone know what we need to do to get all the mailing list archives listed
here? http://mail-archives.apache.org/mod_mbox/

I only see the commits archive currently.

-Tyler


Re: List Archives

2016-02-04 Thread Tyler Akidau
Great, thank you!

On Thu, Feb 4, 2016 at 11:34 PM Jean-Baptiste Onofré 
wrote:

> Hi Tyler,
>
> it's because, we need at least one message on the list to be on
> mod_mbox. So, we should have it soon ;)
>
> commits list is there because I created Jira, triggering e-mails on this
> list.
>
> The sync takes something like a hour.
>
> Regards
> JB
>
> On 02/05/2016 08:20 AM, Tyler Akidau wrote:
> > Anyone know what we need to do to get all the mailing list archives
> listed
> > here? http://mail-archives.apache.org/mod_mbox/
> >
> > I only see the commits archive currently.
> >
> > -Tyler
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: PPMC

2016-02-05 Thread Tyler Akidau
Yeah, it is here too. And in the USA I think there's generally a positive
association with that term due to BMW being seen as a more of a luxury
brand. Does that impression hold in other areas of the world as well?

On Fri, Feb 5, 2016 at 9:09 AM Aljoscha Krettek  wrote:

> Btw, a beamer is also a BMW car. :-) (at least in britain)
> > On 05 Feb 2016, at 17:32, Amit Sela  wrote:
> >
> > So a Beamer is a member of The BeamTeam ?
> > ;)
> >
> > On Fri, Feb 5, 2016, 18:21 Frances Perry  wrote:
> >
> >> On Fri, Feb 5, 2016 at 7:23 AM, Jean-Baptiste Onofré 
> >> wrote:
> >>
> >>> By the way Tyler, it's the first time that I'm named "beamer" ;)
> >>>
> >>> Got bunch of "bird names" during rugby games, but never beamer. I'm
> proud
> >>> of this one ;)
> >>>
> >>
> >> Not quite as much fun as "Dataflower", but it'll do ;-)
> >>
>
>


Re: PPMC

2016-02-05 Thread Tyler Akidau
Strange, I see that too. JB: any ideas? I just subscribed by emailing
"subscribe" to the subscription alias. Is there something else we're
supposed to do?

-Tyler

On Fri, Feb 5, 2016 at 10:02 AM Aljoscha Krettek 
wrote:

> @google people: Your mail addresses are all “*@google.com.invalid”. Maybe
> this just happens in my client or maybe it’s some problem with the mailing
> list.
> > On 05 Feb 2016, at 18:49, Tyler Akidau 
> wrote:
> >
> > Yeah, it is here too. And in the USA I think there's generally a positive
> > association with that term due to BMW being seen as a more of a luxury
> > brand. Does that impression hold in other areas of the world as well?
> >
> > On Fri, Feb 5, 2016 at 9:09 AM Aljoscha Krettek 
> wrote:
> >
> >> Btw, a beamer is also a BMW car. :-) (at least in britain)
> >>> On 05 Feb 2016, at 17:32, Amit Sela  wrote:
> >>>
> >>> So a Beamer is a member of The BeamTeam ?
> >>> ;)
> >>>
> >>> On Fri, Feb 5, 2016, 18:21 Frances Perry 
> wrote:
> >>>
> >>>> On Fri, Feb 5, 2016 at 7:23 AM, Jean-Baptiste Onofré  >
> >>>> wrote:
> >>>>
> >>>>> By the way Tyler, it's the first time that I'm named "beamer" ;)
> >>>>>
> >>>>> Got bunch of "bird names" during rugby games, but never beamer. I'm
> >> proud
> >>>>> of this one ;)
> >>>>>
> >>>>
> >>>> Not quite as much fun as "Dataflower", but it'll do ;-)
> >>>>
> >>
> >>
>
>


Re: PPMC

2016-02-05 Thread Tyler Akidau
:-)

Trying from my shiny new @apache.org address, to see if that's somehow
different.

-Tyler

On Fri, Feb 5, 2016 at 10:25 AM Dan Halperin 
wrote:

> This is how Tyler found out he got fired. Oh crap, me too.
>
> On Fri, Feb 5, 2016 at 10:24 AM, Tyler Akidau 
> wrote:
>
> > Strange, I see that too. JB: any ideas? I just subscribed by emailing
> > "subscribe" to the subscription alias. Is there something else we're
> > supposed to do?
> >
> > -Tyler
> >
> > On Fri, Feb 5, 2016 at 10:02 AM Aljoscha Krettek 
> > wrote:
> >
> > > @google people: Your mail addresses are all “*@google.com.invalid”.
> Maybe
> > > this just happens in my client or maybe it’s some problem with the
> > mailing
> > > list.
> > > > On 05 Feb 2016, at 18:49, Tyler Akidau 
> > > wrote:
> > > >
> > > > Yeah, it is here too. And in the USA I think there's generally a
> > positive
> > > > association with that term due to BMW being seen as a more of a
> luxury
> > > > brand. Does that impression hold in other areas of the world as well?
> > > >
> > > > On Fri, Feb 5, 2016 at 9:09 AM Aljoscha Krettek  >
> > > wrote:
> > > >
> > > >> Btw, a beamer is also a BMW car. :-) (at least in britain)
> > > >>> On 05 Feb 2016, at 17:32, Amit Sela  wrote:
> > > >>>
> > > >>> So a Beamer is a member of The BeamTeam ?
> > > >>> ;)
> > > >>>
> > > >>> On Fri, Feb 5, 2016, 18:21 Frances Perry 
> > > wrote:
> > > >>>
> > > >>>> On Fri, Feb 5, 2016 at 7:23 AM, Jean-Baptiste Onofré <
> > j...@nanthrax.net
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> By the way Tyler, it's the first time that I'm named "beamer" ;)
> > > >>>>>
> > > >>>>> Got bunch of "bird names" during rugby games, but never beamer.
> I'm
> > > >> proud
> > > >>>>> of this one ;)
> > > >>>>>
> > > >>>>
> > > >>>> Not quite as much fun as "Dataflower", but it'll do ;-)
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>


Re: Apache Storm Runner for Beam?

2016-02-09 Thread Tyler Akidau
The two runner efforts I'm aware of currently are Flink and Spark. I'm not
aware of any effort to build a Storm runner, but it would certainly be
welcome.

-Tyler


On Tue, Feb 9, 2016 at 11:08 AM Yang, Connie  wrote:

> Hi Beamers,
>
> We’re considering adding an Apache Storm Runner to Apache Beam.
>
>   *   Has anyone in the community considered this and started working on
> this?
>   *   If so, what’s the current state?
>
> We would like to join the design discussion and development of it if it’s
> in flight.
>
> Thanks
> Connie
>


Re: Spark Runner Technical Vision

2016-02-10 Thread Tyler Akidau
+1. Added a bunch of comments. Thanks!

On Wed, Feb 10, 2016 at 9:35 PM Jean-Baptiste Onofré 
wrote:

> Thanks Amit.
>
> Regards
> JB
>
> On 02/10/2016 07:46 PM, Amit Sela wrote:
> > Following the technical vision Frances sent, and following the previous
> > document I wrote on the spark runner gaps, I'm attaching a suggested
> > technical vision to the Spark runner.
> >
> > I tried to be inline with the project's technical vision both in
> timeline,
> > and in document style. I suggest that once the ASF Wiki is up, we'll have
> > roadmap for runners as well.
> >
> > Thanks,
> > Amit
> >
> >
> https://docs.google.com/document/d/1y4qlQinjjrusGWlgq-mYmbxRW2z7-_X5Xax-GG0YsC0/edit?usp=sharing
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Spark Runner Technical Vision

2016-02-10 Thread Tyler Akidau
+1

Although, should/could we have a shared folder to store things in as well
(since we probably don't want to be linking everything from the website,
but it'd be nice to have a consistent place for all relevant docs to be
found)?

-Tyler

On Wed, Feb 10, 2016 at 10:51 PM James Malone
 wrote:

> Any objections if I link to this (and other design) doc on the (upcoming)
> project site (beam.incubator.apache.org)?
>
> On Wed, Feb 10, 2016 at 10:31 PM, Tyler Akidau  >
> wrote:
>
> > +1. Added a bunch of comments. Thanks!
> >
> > On Wed, Feb 10, 2016 at 9:35 PM Jean-Baptiste Onofré 
> > wrote:
> >
> > > Thanks Amit.
> > >
> > > Regards
> > > JB
> > >
> > > On 02/10/2016 07:46 PM, Amit Sela wrote:
> > > > Following the technical vision Frances sent, and following the
> previous
> > > > document I wrote on the spark runner gaps, I'm attaching a suggested
> > > > technical vision to the Spark runner.
> > > >
> > > > I tried to be inline with the project's technical vision both in
> > > timeline,
> > > > and in document style. I suggest that once the ASF Wiki is up, we'll
> > have
> > > > roadmap for runners as well.
> > > >
> > > > Thanks,
> > > > Amit
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1y4qlQinjjrusGWlgq-mYmbxRW2z7-_X5Xax-GG0YsC0/edit?usp=sharing
> > > >
> > >
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>


Re: Apache Beam blog

2016-02-12 Thread Tyler Akidau
+1

-Tyler

On Fri, Feb 12, 2016 at 9:57 AM Amit Sela  wrote:

> +1
>
> I think we could also publish user's use-case examples and stories. "How we
> are using Beam" or something like that.
>
> On Fri, Feb 12, 2016, 19:49 James Malone 
> wrote:
>
> > Hello everyone,
> >
> > Now that we have a skeleton website up (horray!) I wanted to raise the
> idea
> > of a "Beam Blog." I am thinking this blog is where we can show news, cool
> > new Beam-things, examples for Apache Beam, and so on. This blog would
> live
> > under the project website (http://beam.incubator.apache.org).
> >
> > To that end, I would like to poll how the larger community feels about
> > this. Additionally, if people think it's a good idea, I'd like to start
> > thinking about content which we would want to feature on the blog.
> >
> > I happily volunteer to coordinate, review, and assist with the blog.
> >
> > Cheers!
> >
> > James
> >
>


Capability Matrix

2016-03-09 Thread Tyler Akidau
I just filed BEAM-104 
regarding publishing a capability matrix on the Beam website. We've seeded
the spreadsheet linked there (
https://docs.google.com/spreadsheets/d/1OM077lZBARrtUi6g0X0O0PHaIbFKCD6v0djRefQRE1I/edit
)
with an initial proposed set of capabilities, as well as descriptions for
the model and Cloud Dataflow. If folks for other runners (currently Flink
and Spark) could please make sure their columns are filled out as well,
it'd be much appreciated. Also let us know if there are capabilities you
think we've missed.

Our hope is to get this up and published soon, since we've been getting a
lot of questions regarding runner capabilities, portability, etc.

-Tyler


Re: Capability Matrix

2016-03-12 Thread Tyler Akidau
Thanks all! At this point, it looks like most all of the fields have been
filled out. I'm in the process of migrating the spreadsheet contents to
YAML within the website source, so I've revoked edit access from the doc to
keep things from changing while I'm doing that. If you have further edits
to make, feel free to leave a comment, and I'll incorporate it into the
YAML.

-Tyler


On Thu, Mar 10, 2016 at 12:43 AM Jean-Baptiste Onofré 
wrote:

> Hi Tyler,
>
> good idea !
>
> I like it !
>
> Regards
> JB
>
> On 03/09/2016 11:14 PM, Tyler Akidau wrote:
> > I just filed BEAM-104 <https://issues.apache.org/jira/browse/BEAM-104>
> > regarding publishing a capability matrix on the Beam website. We've
> seeded
> > the spreadsheet linked there (
> >
> https://docs.google.com/spreadsheets/d/1OM077lZBARrtUi6g0X0O0PHaIbFKCD6v0djRefQRE1I/edit
> > )
> > with an initial proposed set of capabilities, as well as descriptions for
> > the model and Cloud Dataflow. If folks for other runners (currently Flink
> > and Spark) could please make sure their columns are filled out as well,
> > it'd be much appreciated. Also let us know if there are capabilities you
> > think we've missed.
> >
> > Our hope is to get this up and published soon, since we've been getting a
> > lot of questions regarding runner capabilities, portability, etc.
> >
> > -Tyler
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Capability Matrix

2016-03-20 Thread Tyler Akidau
Just pushed the capability matrix and an attendant blog post to the site:

   - Blog post:
   
http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html
   - Matrix: http://beam.incubator.apache.org/capability-matrix/

For those of you that want to keep the matrix up to date as your runner
evolves, you'll want to make updates in the _data/capability-matrix.yml
file:
https://github.com/apache/incubator-beam-site/blob/asf-site/_data/capability-matrix.yml

Thanks to everyone for helping fill out the initial set of capabilities!
Looking forward to updates as things progress. :-)

And thanks also to Max for moving all the website stuff to git!

-Tyler


On Sat, Mar 12, 2016 at 9:37 AM Tyler Akidau  wrote:

> Thanks all! At this point, it looks like most all of the fields have been
> filled out. I'm in the process of migrating the spreadsheet contents to
> YAML within the website source, so I've revoked edit access from the doc to
> keep things from changing while I'm doing that. If you have further edits
> to make, feel free to leave a comment, and I'll incorporate it into the
> YAML.
>
> -Tyler
>
>
> On Thu, Mar 10, 2016 at 12:43 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Tyler,
>>
>> good idea !
>>
>> I like it !
>>
>> Regards
>> JB
>>
>> On 03/09/2016 11:14 PM, Tyler Akidau wrote:
>> > I just filed BEAM-104 <https://issues.apache.org/jira/browse/BEAM-104>
>> > regarding publishing a capability matrix on the Beam website. We've
>> seeded
>> > the spreadsheet linked there (
>> >
>> https://docs.google.com/spreadsheets/d/1OM077lZBARrtUi6g0X0O0PHaIbFKCD6v0djRefQRE1I/edit
>> > )
>> > with an initial proposed set of capabilities, as well as descriptions
>> for
>> > the model and Cloud Dataflow. If folks for other runners (currently
>> Flink
>> > and Spark) could please make sure their columns are filled out as well,
>> > it'd be much appreciated. Also let us know if there are capabilities you
>> > think we've missed.
>> >
>> > Our hope is to get this up and published soon, since we've been getting
>> a
>> > lot of questions regarding runner capabilities, portability, etc.
>> >
>> > -Tyler
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


Re: Capability Matrix

2016-03-21 Thread Tyler Akidau
Thanks, all!

On Fri, Mar 18, 2016 at 6:46 AM Amit Sela  wrote:

> Looks great!
> I think it's the best way to give a clear picture of capabilities for users
> and runner developers.
> And as always, Love the colours ;)
>
>
> On Fri, Mar 18, 2016 at 3:33 PM Kostas Kloudas <
> k.klou...@data-artisans.com>
> wrote:
>
> > Great to have an overview of the available
> > runners and a comprehensible visualization of
> > the features each one supports!
> >
> > Kostas
> >
> > > On Mar 18, 2016, at 11:32 AM, Maximilian Michels 
> wrote:
> > >
> > > Well done. The matrix provides a good basis for improving the existing
> > > runners. Moreover, new backends can use it to evaluate capabilities
> > > for creating a runner.
> > >
> > > On Fri, Mar 18, 2016 at 1:15 AM, Jean-Baptiste Onofré  >
> > wrote:
> > >> Catcha, thanks !
> > >>
> > >> Regards
> > >> JB
> > >>
> > >>
> > >> On 03/18/2016 12:51 AM, Frances Perry wrote:
> > >>>
> > >>> That's "partially". Check out the full matrix for complete details:
> > >>> http://beam.incubator.apache.org/capability-matrix/
> > >>>
> > >>> On Thu, Mar 17, 2016 at 4:50 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net
> > >
> > >>> wrote:
> > >>>
> > >>>> Great job !
> > >>>>
> > >>>> By the way, when you use ~ in the matrix, does it mean that it works
> > only
> > >>>> in some cases (depending of the pipeline or transform) or it doesn't
> > work
> > >>>> as expected ? Just curious for the Aggregators and the meaning in
> the
> > >>>> Beam
> > >>>> Model.
> > >>>>
> > >>>> Thanks,
> > >>>> Regards
> > >>>> JB
> > >>>>
> > >>>>
> > >>>> On 03/18/2016 12:45 AM, Tyler Akidau wrote:
> > >>>>
> > >>>>> Just pushed the capability matrix and an attendant blog post to the
> > >>>>> site:
> > >>>>>
> > >>>>> - Blog post:
> > >>>>>
> > >>>>>
> > >>>>>
> >
> http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html
> > >>>>> - Matrix: http://beam.incubator.apache.org/capability-matrix/
> > >>>>>
> > >>>>> For those of you that want to keep the matrix up to date as your
> > runner
> > >>>>> evolves, you'll want to make updates in the
> > _data/capability-matrix.yml
> > >>>>> file:
> > >>>>>
> > >>>>>
> > >>>>>
> >
> https://github.com/apache/incubator-beam-site/blob/asf-site/_data/capability-matrix.yml
> > >>>>>
> > >>>>> Thanks to everyone for helping fill out the initial set of
> > capabilities!
> > >>>>> Looking forward to updates as things progress. :-)
> > >>>>>
> > >>>>> And thanks also to Max for moving all the website stuff to git!
> > >>>>>
> > >>>>> -Tyler
> > >>>>>
> > >>>>>
> > >>>>> On Sat, Mar 12, 2016 at 9:37 AM Tyler Akidau 
> > wrote:
> > >>>>>
> > >>>>> Thanks all! At this point, it looks like most all of the fields
> have
> > >>>>> been
> > >>>>>>
> > >>>>>> filled out. I'm in the process of migrating the spreadsheet
> > contents to
> > >>>>>> YAML within the website source, so I've revoked edit access from
> the
> > >>>>>> doc
> > >>>>>> to
> > >>>>>> keep things from changing while I'm doing that. If you have
> further
> > >>>>>> edits
> > >>>>>> to make, feel free to leave a comment, and I'll incorporate it
> into
> > the
> > >>>>>> YAML.
> > >>>>>>
> > >>>>>> -Tyler
> > >>>>>>
> > >>>>>>
> > >>>>>> On Thu, Mar 10, 2016 at 12:43 AM Jean-Baptiste 

Re: BEAM-206

2016-05-13 Thread Tyler Akidau
+Daniel Halperin 

On Thu, May 12, 2016 at 10:20 AM Roland Harangozo  wrote:

> Hi All,
>
> I would like to fix this issue:
> https://issues.apache.org/jira/browse/BEAM-206
>
> Could you please revise my design proposal?
>
> I would copy and optionally remove the temporary files one by one as an
> atomic operation rather then copying all of the temporary files and then
> removing them (if we need to remove). It has the following benefits:
> * If the move operation supported by the file system and the file retention
> is remove, we can use the native file move operation (or rename). Could be
> significantly faster than the copy and remove.
> * By moving the remove operation close to the copy operation, the
> probability is lower to copy the file again because of any failure (if one
> file of two is moved but the other one failed, when we replay, it moves
> only the one that failed rather than starting from scratch)
>
> Regarding the concurrency, I would use an ExecutorService to run the
> aforementioned operation simultaneously. The first exception would stop
> (interrupt) all operation.
>
> The level of the concurrency (number of threads) would be file system
> specific and configurable. I can imagine 10+ threads gives a good
> performance on GCS but gives bad performance on local file system.
>
> Best regards,
> Roland Harangozo
>


Re: machine learning API, common models

2016-05-13 Thread Tyler Akidau
Thanks a lot, Kam. Can you please enable comment access on the doc? I seem
to have view access only.

-Tyler

On Fri, May 13, 2016 at 9:54 AM Kam Kasravi  wrote:

> Hi
>
> A number of readers have made comments on this topic recently. We have
> created a document that does some analysis of common ML models and related
> APIs. We hope this can drive an approach that will result in an API,
> compatibility matrix and involvement from the same groups that are
> implementing transformation runners (spark, flink, etc). We welcome
> comments here or in the document itself.
>
>
> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4PBECHb-xA/edit?usp=sharing
>


Re: machine learning API, common models

2016-05-13 Thread Tyler Akidau
Hi Kam & Soila,

Thanks a lot for writing this up. I ran the doc past some of the folks
who've been doing ML work here at Google, and they were generally happy
with the distillation of common methods in the doc. I'd be curious to hear
what folks on the Flink- and Spark- runner sides think.

To me, this seems like a good direction for a high-level API. Presumably,
once a high-level API is in place, we could begin looking at what it would
take to add lower-level ML algorithm support (e.g. iterative) to the Beam
Model. Is this essentially what you're thinking?

Some more specific questions/comments:

   - Presumably you'd want to tackle this in Java first, since that's the
   only language we currently support? Given that half of your examples are in
   Python, I'm also assuming Python will be interesting once it's available.

   - Along those lines, what languages are represented in the capability
   matrix? E.g. is Spark ML support as detailed there identical across
   Java/Scala and Python?

   - Have you thought about how this would tie in at the runner level,
   particularly given the updated Runner API changes that are coming? I'm
   assuming they'd be provided as composite transforms that (for now) would
   have no default implementation, given the lack of low-level primitives for
   ML algorithms, but am curious what your thoughts are there.

   - I still don't fully understand how incremental updates due to model
   drift would tie in at the API level. There's a comment thread in the doc
   still open tracking this, so no need to comment here additionally. Just
   pointing it out as one of the things that stands out as potentially having
   API-level impacts to me that doesn't seem 100% fleshed out in the doc yet
   (thought that admittedly may just be my limited understanding at this point
   :-).

-Tyler




On Fri, May 13, 2016 at 10:48 AM Kam Kasravi  wrote:

> Hi Tyler - my bad. Comments should be enabled now.
>
> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau  >
> wrote:
>
> > Thanks a lot, Kam. Can you please enable comment access on the doc? I
> seem
> > to have view access only.
> >
> > -Tyler
> >
> > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi 
> wrote:
> >
> > > Hi
> > >
> > > A number of readers have made comments on this topic recently. We have
> > > created a document that does some analysis of common ML models and
> > related
> > > APIs. We hope this can drive an approach that will result in an API,
> > > compatibility matrix and involvement from the same groups that are
> > > implementing transformation runners (spark, flink, etc). We welcome
> > > comments here or in the document itself.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4PBECHb-xA/edit?usp=sharing
> > >
> >
>


Re: machine learning API, common models

2016-05-17 Thread Tyler Akidau
On Sat, May 14, 2016 at 4:53 AM Kavulya, Soila P 
wrote:

> Hi Tyler,
>
> Thank you so much for your feedback. I agree that starting with the
> high-level API is a good direction. We are interested in Python because it
> is the language that our data scientists are most familiar with. I think
> starting with Java would be the best approach, because the Python API can
> be a thin wrapper for Java API.
>
> In Spark, the Scala, Java and Python APIs are identical. Flink does not
> have a Python API for ML pipelines at present.
>
> Could you point me to the updated runner API?
>

Sorry for the delay; I've been traveling. The runner API proposal is here:
https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit

-Tyler


>
> Soila
>
> -Original Message-
> From: Tyler Akidau [mailto:taki...@google.com.INVALID]
> Sent: Friday, May 13, 2016 6:34 PM
> To: dev@beam.incubator.apache.org
> Subject: Re: machine learning API, common models
>
> Hi Kam & Soila,
>
> Thanks a lot for writing this up. I ran the doc past some of the folks
> who've been doing ML work here at Google, and they were generally happy
> with the distillation of common methods in the doc. I'd be curious to hear
> what folks on the Flink- and Spark- runner sides think.
>
> To me, this seems like a good direction for a high-level API. Presumably,
> once a high-level API is in place, we could begin looking at what it would
> take to add lower-level ML algorithm support (e.g. iterative) to the Beam
> Model. Is this essentially what you're thinking?
>
> Some more specific questions/comments:
>
>- Presumably you'd want to tackle this in Java first, since that's the
>only language we currently support? Given that half of your examples
> are in
>Python, I'm also assuming Python will be interesting once it's
> available.
>
>- Along those lines, what languages are represented in the capability
>matrix? E.g. is Spark ML support as detailed there identical across
>Java/Scala and Python?
>
>- Have you thought about how this would tie in at the runner level,
>particularly given the updated Runner API changes that are coming? I'm
>assuming they'd be provided as composite transforms that (for now) would
>have no default implementation, given the lack of low-level primitives
> for
>ML algorithms, but am curious what your thoughts are there.
>
>- I still don't fully understand how incremental updates due to model
>drift would tie in at the API level. There's a comment thread in the doc
>still open tracking this, so no need to comment here additionally. Just
>pointing it out as one of the things that stands out as potentially
> having
>API-level impacts to me that doesn't seem 100% fleshed out in the doc
> yet
>(thought that admittedly may just be my limited understanding at this
> point
>:-).
>
> -Tyler
>
>
>
>
> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi  wrote:
>
> > Hi Tyler - my bad. Comments should be enabled now.
> >
> > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> >  > >
> > wrote:
> >
> > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > I
> > seem
> > > to have view access only.
> > >
> > > -Tyler
> > >
> > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi 
> > wrote:
> > >
> > > > Hi
> > > >
> > > > A number of readers have made comments on this topic recently. We
> > > > have created a document that does some analysis of common ML
> > > > models and
> > > related
> > > > APIs. We hope this can drive an approach that will result in an
> > > > API, compatibility matrix and involvement from the same groups
> > > > that are implementing transformation runners (spark, flink, etc).
> > > > We welcome comments here or in the document itself.
> > > >
> > > >
> > > >
> > >
> > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4
> > PBECHb-xA/edit?usp=sharing
> > > >
> > >
> >
>


Beam Talks at Flink Forward

2016-06-27 Thread Tyler Akidau
Hello Beamers,

Flink Forward, the Apache Flink conference organized by data Artisans, is
happening again this year in Berlin, September 12-14. If you have a
Beam-related talk you'd like to give there, please submit a proposal at
http://flink-forward.org/submit-your-talk/ in the next day or two. The
Flink community is well positioned to take advantage of Beam, and we should
have space for a number of Beam-related talks in the roster (I'm on the PC,
trying to make sure Beam has a solid presence there :-). Note that anything
Beam-related is a reasonable candidate; doesn't have to be specifically
Flink focused, though that's of course a bonus.

-Tyler


Re: Beam Interview

2016-07-11 Thread Tyler Akidau
+1. Thanks a lot for putting this together. :-)

On Mon, Jul 11, 2016 at 9:33 PM Frances Perry 
wrote:

> Love this, Jesse! And pretty inspired reading the answers so far ;-)
>
> On Mon, Jul 11, 2016 at 1:42 PM, Jesse Anderson 
> wrote:
>
> > Thanks!
> >
> > On Mon, Jul 11, 2016 at 1:02 PM Ismaël Mejía  wrote:
> >
> > > Great Idea, I just added my answers, English is not my native language,
> > so
> > > feel free to edit if you find any grammatical mistakes, sorry.
> > >
> > > Ismael
> > >
> > > On Mon, Jul 11, 2016 at 7:12 PM, Jesse Anderson  >
> > > wrote:
> > >
> > > > I really appreciate the turnout. I'm pleasantly surprised with the
> > varied
> > > > responses I've received.
> > > >
> > > > I plan to publish this post on July 13 at 9 AM PT. If you'd like to
> add
> > > > your input, please do it before that time.
> > > >
> > > > Thanks,
> > > >
> > > > Jesse
> > > >
> > > > On Fri, Jul 8, 2016 at 1:30 PM Amit Sela 
> wrote:
> > > >
> > > > > That's great Jesse!
> > > > >
> > > > > Added my comments.
> > > > >
> > > > > Thanks,
> > > > > Amit
> > > > >
> > > > > On Fri, Jul 8, 2016 at 8:56 PM Shiv Shankar <
> > > shiv.shivshan...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > > I am a User and learner. I just added my view points.
> > > > > >
> > > > > > Thanks
> > > > > > SV
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 8, 2016 at 1:51 AM, Sergio Fernández <
> > wik...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Great idea!
> > > > > > >
> > > > > > > On Fri, Jul 8, 2016 at 7:44 AM, Jean-Baptiste Onofré <
> > > > j...@nanthrax.net>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jesse,
> > > > > > > >
> > > > > > > > good idea. Just complete the doc.
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > JB
> > > > > > > >
> > > > > > > >
> > > > > > > > On 07/08/2016 02:18 AM, Jesse Anderson wrote:
> > > > > > > >
> > > > > > > >> I've been thinking about ways to get more Beam information
> out
> > > > there
> > > > > > > >> without too much fuss over getting everything right. I came
> up
> > > > with
> > > > > a
> > > > > > > >> written Q and A that represents the most common questions I
> > get.
> > > > > > > >>
> > > > > > > >> Answering the questions should take 5-10 minutes. I think it
> > > will
> > > > > go a
> > > > > > > >> long
> > > > > > > >> ways towards getting more Beam users.
> > > > > > > >>
> > > > > > > >> 1. Here is the Google Doc link:
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1IQt6FfQI7W4d2QxZm6WwGnZFdA8JmaseKZrMGPu8zgY/edit#
> > > > > > > >> 2. Add your name and initials.
> > > > > > > >> 3. When you answer a question, just prefix it with your
> > > > > initials.
> > > > > > > >>
> > > > > > > >> I really appreciate you taking the time to answer things.
> I'll
> > > > > publish
> > > > > > > the
> > > > > > > >> results of the Q and A on my blog and email out the link
> once
> > > it's
> > > > > up
> > > > > > > >> there.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Jesse
> > > > > > > >>
> > > > > > > >>
> > > > > > > > --
> > > > > > > > Jean-Baptiste Onofré
> > > > > > > > jbono...@apache.org
> > > > > > > > http://blog.nanthrax.net
> > > > > > > > Talend - http://www.talend.com
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sergio Fernández
> > > > > > > Partner Technology Manager
> > > > > > > Redlink GmbH
> > > > > > > m: +43 6602747925
> > > > > > > e: sergio.fernan...@redlink.co
> > > > > > > w: http://redlink.co
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Including Apex runner in Beam tutorial at Strata - Singapore

2016-11-15 Thread Tyler Akidau
Yes, I'll be giving the tutorial in Singapore w/ Dan Halperin. We'd be
happy to include Apex in the demos as part of that. Let's sync up offline
about what that will entail.

-Tyler


On Tue, Nov 15, 2016 at 10:02 AM Davor Bonaci 
wrote:

> Hi Sandeep,
> It would be great to include the Apex runner as a part of any tutorial
> going forward. I suspect we'll have the 0.4.0-incubating release completed
> just before Strata Singapore, which will the first release with the Apex
> runner, so that aligns quite nicely.
>
> Are you planning to attend Strata Singapore? If so, I'd encourage you to
> reach out to Tyler Akidau offline, who's leading the tutorial on this
> conference.
>
> Davor
>
> On Tue, Nov 15, 2016 at 7:04 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi Sandeep,
> >
> > Great news !
> >
> > Yes, you can definitely do a demo using the Apex runner. It's what Dan
> and
> > I are also planning during ApacheCon this week: same Wordcount example
> > running on different execution engines.
> >
> > Maybe this blog could help you to prepare the demo:
> > http://blog.nanthrax.net/2016/08/apache-beam-in-action-same-
> > code-several-execution-engines/
> >
> > By the way, I will propose a PR to "merge" those blog to Beam website.
> >
> > Regards
> > JB
> >
> >
> > On 11/15/2016 04:00 PM, Sandeep Deshmukh wrote:
> >
> >> Dear Beam Community,
> >>
> >> There is a Beam tutorial in Strata-Singapore. I would like to explore
> >> possibility of including the Apex runner as a part of that tutorial. As
> >> Apex runner is recently merged into master branch of Beam, it would be
> of
> >> interest to many people.
> >>
> >> Please let us know if we can do so. I can accordingly work on the same.
> >>
> >> Regards,
> >> Sandeep
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: [DISCUSS] Graduation to a top-level project

2016-11-22 Thread Tyler Akidau
+1, thanks to everyone who's invested time getting us to this point. :-)

-Tyler

On Tue, Nov 22, 2016 at 10:33 AM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> First of all, I would like to thank the whole team, and especially Davor
> for the great work and commitment to Apache and the community.
>
> Of course, a big +1 to move forward on graduation !
>
> Regards
> JB
>
> On 11/22/2016 07:19 PM, Davor Bonaci wrote:
> > Hi everyone,
> > With all the progress we’ve had recently in Apache Beam, I think it is
> time
> > we start the discussion about graduation as a new top-level project at
> the
> > Apache Software Foundation.
> >
> > Graduation means we are a self-sustaining and self-governing community,
> and
> > ready to be a full participant in the Apache Software Foundation. It does
> > not imply that our community growth is complete or that a particular
> level
> > of technical maturity has been reached, rather that we are on a solid
> > trajectory in those areas. After graduation, we will still periodically
> > report to, and be overseen by, the ASF Board to ensure continued growth
> of
> > a healthy community.
> >
> > Graduation is an important milestone for the project. It is also key to
> > further grow the user community: many users (incorrectly) see incubation
> as
> > a sign of instability and are much less likely to consider us for a
> > production use.
> >
> > A way to think about graduation readiness is through the Apache Maturity
> > Model [1]. I think we clearly satisfy all the requirements [2]. It is
> > probably worth emphasizing the recent community growth: over each of the
> > past three months, no single organization contributing to Beam has had
> more
> > than ~50% of the unique contributors per month [2, see assumptions].
> That’s
> > a great statistic that shows how much we’ve grown our diversity!
> >
> > Process-wise, graduation consists of drafting a board resolution, which
> > needs to identify the full Project Management Committee, and getting it
> > approved by the community, the Incubator, and the Board. Within the Beam
> > community, most of these discussions and votes have to be on the private@
> > mailing list, but, as usual, we’ll try to keep dev@ updated as much as
> > possible.
> >
> > With that in mind, let’s use this discussion on dev@ for two things:
> > * Collect additional data points on our progress that we may want to
> > present to the Incubator as a part of the proposal to accept our
> graduation.
> > * Determine whether the community supports graduation. Please reply +1/-1
> > with any additional comments, as appropriate. I’d encourage everyone to
> > participate -- regardless whether you are an occasional visitor or have a
> > specific role in the project -- we’d love to hear your perspective.
> >
> > Data points so far:
> > * Project’s maturity self-assessment [2].
> > * 1500 pull requests in incubation, which makes us one of the most active
> > project across all of ASF on this metric.
> > * 3 releases, each driven by a different release manager.
> > * 120+ individual contributors.
> > * 3 new committers added, 2 of which aren’t from the largest
> organization.
> > * 1027 issues created, 515 resolved.
> > * 442 dev@ emails in October alone, sent by 51 individuals.
> > * 50 user@ emails in the last 30 days, sent by 22 individuals.
> >
> > Thanks!
> >
> > Davor
> >
> > [1] http://community.apache.org/apache-way/apache-project-
> > maturity-model.html
> > [2] http://beam.incubator.apache.org/contribute/maturity-model/
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-07 Thread Tyler Akidau
+1

On Thu, Dec 8, 2016 at 1:10 PM Jean-Baptiste Onofré  wrote:

> +1
>
> Regards
> JB
>
> On 12/07/2016 10:37 PM, Kenneth Knowles wrote:
> > Hi all,
> >
> > I want to bring up another major backwards-incompatible change before it
> is
> > too late, to resolve [BEAM-438].
> >
> > Summary: Leave PInput.apply the same but rename PTransform.apply to
> > PTransform.expand. I have opened [PR #1538] just for reference (it took
> 30
> > seconds using IDE automated refactor)
> >
> > This change affects *PTransform authors* but does *not* affect pipeline
> > authors.
> >
> > This issue was filed a long time ago. It has been a problem many times
> with
> > actual users since before Beam started incubating. This is what goes
> wrong
> > (often):
> >
> >PCollection input = ...
> >PTransform, ...> transform = ...
> >
> >transform.apply(input)
> >
> > This type checks and even looks perfectly normal. Do you see the error?
> >
> > ... what we need the user to write is:
> >
> > input.apply(transform)
> >
> > What a confusing difference! After all, the first one type-checks and the
> > first one is how you apply a Function or Predicate or
> SerializableFunction,
> > etc. But it is broken. With transform.apply(input) the transform is not
> > registered with the pipeline at all.
> >
> > We obviously can't (and don't want to) change the most core way that
> > pipeline authors use Beam, so PInput.apply (aka PCollection.apply) must
> > remain the same. But we do need a way to make it impossible to mix these
> up.
> >
> > The simplest way I can think of is to choose a new name for the other
> > method involved. Users probably won't write transform.expand(input) since
> > they will never have seen it in any examples, etc. This will just make
> > PTransform authors need to do a global rename, and the type system will
> > direct them to all cases so there is no silent failure possible.
> >
> > What do you think?
> >
> > Kenn
> >
> > [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> > [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> >
> > p.s. there is a really amusing and confusing call chain:
> PCollection.apply
> > -> Pipeline.applyTransform -> Pipeline.applyInternal ->
> > PipelineRunner.apply -> PTransform.apply
> >
> > After this change and work to get the runner out of the loop, it becomes
> > PCollection.apply -> Pipeline.applyTransform -> PTransform.expand
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>