On Sat, May 14, 2016 at 4:53 AM Kavulya, Soila P <soila.p.kavu...@intel.com> wrote:
> Hi Tyler, > > Thank you so much for your feedback. I agree that starting with the > high-level API is a good direction. We are interested in Python because it > is the language that our data scientists are most familiar with. I think > starting with Java would be the best approach, because the Python API can > be a thin wrapper for Java API. > > In Spark, the Scala, Java and Python APIs are identical. Flink does not > have a Python API for ML pipelines at present. > > Could you point me to the updated runner API? > Sorry for the delay; I've been traveling. The runner API proposal is here: https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit -Tyler > > Soila > > -----Original Message----- > From: Tyler Akidau [mailto:taki...@google.com.INVALID] > Sent: Friday, May 13, 2016 6:34 PM > To: dev@beam.incubator.apache.org > Subject: Re: machine learning API, common models > > Hi Kam & Soila, > > Thanks a lot for writing this up. I ran the doc past some of the folks > who've been doing ML work here at Google, and they were generally happy > with the distillation of common methods in the doc. I'd be curious to hear > what folks on the Flink- and Spark- runner sides think. > > To me, this seems like a good direction for a high-level API. Presumably, > once a high-level API is in place, we could begin looking at what it would > take to add lower-level ML algorithm support (e.g. iterative) to the Beam > Model. Is this essentially what you're thinking? > > Some more specific questions/comments: > > - Presumably you'd want to tackle this in Java first, since that's the > only language we currently support? Given that half of your examples > are in > Python, I'm also assuming Python will be interesting once it's > available. > > - Along those lines, what languages are represented in the capability > matrix? E.g. is Spark ML support as detailed there identical across > Java/Scala and Python? > > - Have you thought about how this would tie in at the runner level, > particularly given the updated Runner API changes that are coming? I'm > assuming they'd be provided as composite transforms that (for now) would > have no default implementation, given the lack of low-level primitives > for > ML algorithms, but am curious what your thoughts are there. > > - I still don't fully understand how incremental updates due to model > drift would tie in at the API level. There's a comment thread in the doc > still open tracking this, so no need to comment here additionally. Just > pointing it out as one of the things that stands out as potentially > having > API-level impacts to me that doesn't seem 100% fleshed out in the doc > yet > (thought that admittedly may just be my limited understanding at this > point > :-). > > -Tyler > > > > > On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <kamkasr...@gmail.com> wrote: > > > Hi Tyler - my bad. Comments should be enabled now. > > > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau > > <taki...@google.com.invalid > > > > > wrote: > > > > > Thanks a lot, Kam. Can you please enable comment access on the doc? > > > I > > seem > > > to have view access only. > > > > > > -Tyler > > > > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <kamkasr...@gmail.com> > > wrote: > > > > > > > Hi > > > > > > > > A number of readers have made comments on this topic recently. We > > > > have created a document that does some analysis of common ML > > > > models and > > > related > > > > APIs. We hope this can drive an approach that will result in an > > > > API, compatibility matrix and involvement from the same groups > > > > that are implementing transformation runners (spark, flink, etc). > > > > We welcome comments here or in the document itself. > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4 > > PBECHb-xA/edit?usp=sharing > > > > > > > > > >