Re: machine learning API, common models

Simone Robutti Mon, 16 May 2016 06:05:47 -0700

Hello,

I'm Simone and I just began contributing to Flink ML (actually on the
distributed linalg part). I already expressed my concerns about the idea of
an high level API relying on specific frameworks' implementations:
different implementations produce different results and may vary in
quality. Also the semantics of parameters may change from one
implementation to the other. This could hinder portability and
transparency. I believe these problems could be handled paying the due
attention to the details of every single implementation but I invite you
not to underestimate these problems.


On the other hand the API in itself looks good to me. From my side, I hope
to fill some of the gaps in Flink you underlined in the comparison matrix.

Talking about matrices, proper matrices this time, I believe it would be
useful to include in this API support for linear algebra operations.
Something similar is already present in Mahout's Samsara and it looks
really good but clearly a similar implementation on Beam would be way more
interesting and powerful.

My 2 cents,

Simone


2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <soila.p.kavu...@intel.com>:

> Hi Tyler,
>
> Thank you so much for your feedback. I agree that starting with the
> high-level API is a good direction. We are interested in Python because it
> is the language that our data scientists are most familiar with. I think
> starting with Java would be the best approach, because the Python API can
> be a thin wrapper for Java API.
>
> In Spark, the Scala, Java and Python APIs are identical. Flink does not
> have a Python API for ML pipelines at present.
>
> Could you point me to the updated runner API?
>
> Soila
>
> -----Original Message-----
> From: Tyler Akidau [mailto:taki...@google.com.INVALID]
> Sent: Friday, May 13, 2016 6:34 PM
> To: dev@beam.incubator.apache.org
> Subject: Re: machine learning API, common models
>
> Hi Kam & Soila,
>
> Thanks a lot for writing this up. I ran the doc past some of the folks
> who've been doing ML work here at Google, and they were generally happy
> with the distillation of common methods in the doc. I'd be curious to hear
> what folks on the Flink- and Spark- runner sides think.
>
> To me, this seems like a good direction for a high-level API. Presumably,
> once a high-level API is in place, we could begin looking at what it would
> take to add lower-level ML algorithm support (e.g. iterative) to the Beam
> Model. Is this essentially what you're thinking?
>
> Some more specific questions/comments:
>
>    - Presumably you'd want to tackle this in Java first, since that's the
>    only language we currently support? Given that half of your examples
> are in
>    Python, I'm also assuming Python will be interesting once it's
> available.
>
>    - Along those lines, what languages are represented in the capability
>    matrix? E.g. is Spark ML support as detailed there identical across
>    Java/Scala and Python?
>
>    - Have you thought about how this would tie in at the runner level,
>    particularly given the updated Runner API changes that are coming? I'm
>    assuming they'd be provided as composite transforms that (for now) would
>    have no default implementation, given the lack of low-level primitives
> for
>    ML algorithms, but am curious what your thoughts are there.
>
>    - I still don't fully understand how incremental updates due to model
>    drift would tie in at the API level. There's a comment thread in the doc
>    still open tracking this, so no need to comment here additionally. Just
>    pointing it out as one of the things that stands out as potentially
> having
>    API-level impacts to me that doesn't seem 100% fleshed out in the doc
> yet
>    (thought that admittedly may just be my limited understanding at this
> point
>    :-).
>
> -Tyler
>
>
>
>
> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <kamkasr...@gmail.com> wrote:
>
> > Hi Tyler - my bad. Comments should be enabled now.
> >
> > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > <taki...@google.com.invalid
> > >
> > wrote:
> >
> > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > I
> > seem
> > > to have view access only.
> > >
> > > -Tyler
> > >
> > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <kamkasr...@gmail.com>
> > wrote:
> > >
> > > > Hi
> > > >
> > > > A number of readers have made comments on this topic recently. We
> > > > have created a document that does some analysis of common ML
> > > > models and
> > > related
> > > > APIs. We hope this can drive an approach that will result in an
> > > > API, compatibility matrix and involvement from the same groups
> > > > that are implementing transformation runners (spark, flink, etc).
> > > > We welcome comments here or in the document itself.
> > > >
> > > >
> > > >
> > >
> > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4
> > PBECHb-xA/edit?usp=sharing
> > > >
> > >
> >
>

Re: machine learning API, common models

Reply via email to