Re: machine learning API, common models

Kam Kasravi Mon, 16 May 2016 08:22:36 -0700

Thanks Simone - yes I had read your concerns on dev and I think they're
well founded.
Thanks for the samsura reference - I've been looking at the spark/scala
bindings
http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf.


I think we should expand the document to include linear algebraic ops or
least pay
due diligence to it. If you're doing anything on the flink side in this
regard let us or
feel free to suggest edits/updates to the document.

Thanks
Kam

On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
simone.robu...@radicalbit.io> wrote:

> Hello,
>
> I'm Simone and I just began contributing to Flink ML (actually on the
> distributed linalg part). I already expressed my concerns about the idea of
> an high level API relying on specific frameworks' implementations:
> different implementations produce different results and may vary in
> quality. Also the semantics of parameters may change from one
> implementation to the other. This could hinder portability and
> transparency. I believe these problems could be handled paying the due
> attention to the details of every single implementation but I invite you
> not to underestimate these problems.
>
> On the other hand the API in itself looks good to me. From my side, I hope
> to fill some of the gaps in Flink you underlined in the comparison matrix.
>
> Talking about matrices, proper matrices this time, I believe it would be
> useful to include in this API support for linear algebra operations.
> Something similar is already present in Mahout's Samsara and it looks
> really good but clearly a similar implementation on Beam would be way more
> interesting and powerful.
>
> My 2 cents,
>
> Simone
>
>
> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <soila.p.kavu...@intel.com>:
>
> > Hi Tyler,
> >
> > Thank you so much for your feedback. I agree that starting with the
> > high-level API is a good direction. We are interested in Python because
> it
> > is the language that our data scientists are most familiar with. I think
> > starting with Java would be the best approach, because the Python API can
> > be a thin wrapper for Java API.
> >
> > In Spark, the Scala, Java and Python APIs are identical. Flink does not
> > have a Python API for ML pipelines at present.
> >
> > Could you point me to the updated runner API?
> >
> > Soila
> >
> > -----Original Message-----
> > From: Tyler Akidau [mailto:taki...@google.com.INVALID]
> > Sent: Friday, May 13, 2016 6:34 PM
> > To: dev@beam.incubator.apache.org
> > Subject: Re: machine learning API, common models
> >
> > Hi Kam & Soila,
> >
> > Thanks a lot for writing this up. I ran the doc past some of the folks
> > who've been doing ML work here at Google, and they were generally happy
> > with the distillation of common methods in the doc. I'd be curious to
> hear
> > what folks on the Flink- and Spark- runner sides think.
> >
> > To me, this seems like a good direction for a high-level API. Presumably,
> > once a high-level API is in place, we could begin looking at what it
> would
> > take to add lower-level ML algorithm support (e.g. iterative) to the Beam
> > Model. Is this essentially what you're thinking?
> >
> > Some more specific questions/comments:
> >
> >    - Presumably you'd want to tackle this in Java first, since that's the
> >    only language we currently support? Given that half of your examples
> > are in
> >    Python, I'm also assuming Python will be interesting once it's
> > available.
> >
> >    - Along those lines, what languages are represented in the capability
> >    matrix? E.g. is Spark ML support as detailed there identical across
> >    Java/Scala and Python?
> >
> >    - Have you thought about how this would tie in at the runner level,
> >    particularly given the updated Runner API changes that are coming? I'm
> >    assuming they'd be provided as composite transforms that (for now)
> would
> >    have no default implementation, given the lack of low-level primitives
> > for
> >    ML algorithms, but am curious what your thoughts are there.
> >
> >    - I still don't fully understand how incremental updates due to model
> >    drift would tie in at the API level. There's a comment thread in the
> doc
> >    still open tracking this, so no need to comment here additionally.
> Just
> >    pointing it out as one of the things that stands out as potentially
> > having
> >    API-level impacts to me that doesn't seem 100% fleshed out in the doc
> > yet
> >    (thought that admittedly may just be my limited understanding at this
> > point
> >    :-).
> >
> > -Tyler
> >
> >
> >
> >
> > On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <kamkasr...@gmail.com>
> wrote:
> >
> > > Hi Tyler - my bad. Comments should be enabled now.
> > >
> > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > > <taki...@google.com.invalid
> > > >
> > > wrote:
> > >
> > > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > > I
> > > seem
> > > > to have view access only.
> > > >
> > > > -Tyler
> > > >
> > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <kamkasr...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > A number of readers have made comments on this topic recently. We
> > > > > have created a document that does some analysis of common ML
> > > > > models and
> > > > related
> > > > > APIs. We hope this can drive an approach that will result in an
> > > > > API, compatibility matrix and involvement from the same groups
> > > > > that are implementing transformation runners (spark, flink, etc).
> > > > > We welcome comments here or in the document itself.
> > > > >
> > > > >
> > > > >
> > > >
> > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4
> > > PBECHb-xA/edit?usp=sharing
> > > > >
> > > >
> > >
> >
>

Re: machine learning API, common models

Reply via email to