Hi Tyler, Thank you so much for your feedback. I agree that starting with the high-level API is a good direction. We are interested in Python because it is the language that our data scientists are most familiar with. I think starting with Java would be the best approach, because the Python API can be a thin wrapper for Java API.
In Spark, the Scala, Java and Python APIs are identical. Flink does not have a Python API for ML pipelines at present. Could you point me to the updated runner API? Soila -----Original Message----- From: Tyler Akidau [mailto:taki...@google.com.INVALID] Sent: Friday, May 13, 2016 6:34 PM To: dev@beam.incubator.apache.org Subject: Re: machine learning API, common models Hi Kam & Soila, Thanks a lot for writing this up. I ran the doc past some of the folks who've been doing ML work here at Google, and they were generally happy with the distillation of common methods in the doc. I'd be curious to hear what folks on the Flink- and Spark- runner sides think. To me, this seems like a good direction for a high-level API. Presumably, once a high-level API is in place, we could begin looking at what it would take to add lower-level ML algorithm support (e.g. iterative) to the Beam Model. Is this essentially what you're thinking? Some more specific questions/comments: - Presumably you'd want to tackle this in Java first, since that's the only language we currently support? Given that half of your examples are in Python, I'm also assuming Python will be interesting once it's available. - Along those lines, what languages are represented in the capability matrix? E.g. is Spark ML support as detailed there identical across Java/Scala and Python? - Have you thought about how this would tie in at the runner level, particularly given the updated Runner API changes that are coming? I'm assuming they'd be provided as composite transforms that (for now) would have no default implementation, given the lack of low-level primitives for ML algorithms, but am curious what your thoughts are there. - I still don't fully understand how incremental updates due to model drift would tie in at the API level. There's a comment thread in the doc still open tracking this, so no need to comment here additionally. Just pointing it out as one of the things that stands out as potentially having API-level impacts to me that doesn't seem 100% fleshed out in the doc yet (thought that admittedly may just be my limited understanding at this point :-). -Tyler On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <kamkasr...@gmail.com> wrote: > Hi Tyler - my bad. Comments should be enabled now. > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau > <taki...@google.com.invalid > > > wrote: > > > Thanks a lot, Kam. Can you please enable comment access on the doc? > > I > seem > > to have view access only. > > > > -Tyler > > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <kamkasr...@gmail.com> > wrote: > > > > > Hi > > > > > > A number of readers have made comments on this topic recently. We > > > have created a document that does some analysis of common ML > > > models and > > related > > > APIs. We hope this can drive an approach that will result in an > > > API, compatibility matrix and involvement from the same groups > > > that are implementing transformation runners (spark, flink, etc). > > > We welcome comments here or in the document itself. > > > > > > > > > > > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4 > PBECHb-xA/edit?usp=sharing > > > > > >