Hi Tyler,

Thank you so much for your feedback. I agree that starting with the high-level 
API is a good direction. We are interested in Python because it is the language 
that our data scientists are most familiar with. I think starting with Java 
would be the best approach, because the Python API can be a thin wrapper for 
Java API.

In Spark, the Scala, Java and Python APIs are identical. Flink does not have a 
Python API for ML pipelines at present.

Could you point me to the updated runner API?

Soila

-----Original Message-----
From: Tyler Akidau [mailto:taki...@google.com.INVALID] 
Sent: Friday, May 13, 2016 6:34 PM
To: dev@beam.incubator.apache.org
Subject: Re: machine learning API, common models

Hi Kam & Soila,

Thanks a lot for writing this up. I ran the doc past some of the folks who've 
been doing ML work here at Google, and they were generally happy with the 
distillation of common methods in the doc. I'd be curious to hear what folks on 
the Flink- and Spark- runner sides think.

To me, this seems like a good direction for a high-level API. Presumably, once 
a high-level API is in place, we could begin looking at what it would take to 
add lower-level ML algorithm support (e.g. iterative) to the Beam Model. Is 
this essentially what you're thinking?

Some more specific questions/comments:

   - Presumably you'd want to tackle this in Java first, since that's the
   only language we currently support? Given that half of your examples are in
   Python, I'm also assuming Python will be interesting once it's available.

   - Along those lines, what languages are represented in the capability
   matrix? E.g. is Spark ML support as detailed there identical across
   Java/Scala and Python?

   - Have you thought about how this would tie in at the runner level,
   particularly given the updated Runner API changes that are coming? I'm
   assuming they'd be provided as composite transforms that (for now) would
   have no default implementation, given the lack of low-level primitives for
   ML algorithms, but am curious what your thoughts are there.

   - I still don't fully understand how incremental updates due to model
   drift would tie in at the API level. There's a comment thread in the doc
   still open tracking this, so no need to comment here additionally. Just
   pointing it out as one of the things that stands out as potentially having
   API-level impacts to me that doesn't seem 100% fleshed out in the doc yet
   (thought that admittedly may just be my limited understanding at this point
   :-).

-Tyler




On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <kamkasr...@gmail.com> wrote:

> Hi Tyler - my bad. Comments should be enabled now.
>
> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau 
> <taki...@google.com.invalid
> >
> wrote:
>
> > Thanks a lot, Kam. Can you please enable comment access on the doc? 
> > I
> seem
> > to have view access only.
> >
> > -Tyler
> >
> > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <kamkasr...@gmail.com>
> wrote:
> >
> > > Hi
> > >
> > > A number of readers have made comments on this topic recently. We 
> > > have created a document that does some analysis of common ML 
> > > models and
> > related
> > > APIs. We hope this can drive an approach that will result in an 
> > > API, compatibility matrix and involvement from the same groups 
> > > that are implementing transformation runners (spark, flink, etc). 
> > > We welcome comments here or in the document itself.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4
> PBECHb-xA/edit?usp=sharing
> > >
> >
>

Reply via email to