Re: Fwd: machine learning API, common models

Jianfeng Qian Thu, 19 May 2016 05:47:38 -0700

Hi,
I am quite interested about this proposal.
it is great to consider a lot of machine learning projects.
Currently, most algorithms of spark mllib are batch processing, while  
oryx2 and streamDM focus on real-time machine learning.
And Flink works with SAMOA team to integrate stream mining algorithms, too.
So I wonder is that possible to design A flexible SDK which allow user 
to call different third party packages or their own algorithms?


Best,
Jianfeng

On 2016年05月17日 22:01, Suneel Marthi wrote:
> Thanks Simone for pointing this out.
>
> On the Apache Mahout project we have distributed linear algebra with R-like
> semantics that can be executed on Spark/Flink/H2O.
>
> @Kam: the document u point out is old and outdated, the most up-to-date
> reference to the Samsara api is the book - 'Apache Mahout: Beyond
> MapReduce". (shameless marketing here on behalf of fellow committers :) )
>
> We added Flink DataSet API in the recent Mahout 0.12.0 release (April 11,
> 2016) and has been called out in my talk at ApacheBigData in Vancouver last
> week.
>
> The Mahout community would definitely be interested in being involved with
> this and sharing notes.
>
> IMHO, the focus should be first on building a good linalg foundations
> before embarking on building algos and pipelines. Adding @dlyubimov to this.
>
>
>
> ---------- Forwarded message ----------
> From: Simone Robutti <simone.robu...@radicalbit.io>
> Date: Tue, May 17, 2016 at 9:48 AM
> Subject: Fwd: machine learning API, common models
> To: Suneel Marthi <smar...@apache.org>
>
>
>
> ---------- Forwarded message ----------
> From: Kavulya, Soila P <soila.p.kavu...@intel.com>
> Date: 2016-05-17 1:53 GMT+02:00
> Subject: RE: machine learning API, common models
> To: "dev@beam.incubator.apache.org" <dev@beam.incubator.apache.org>
>
>
> Thanks Simone,
>
> You have raised a valid concern about how different frameworks will have
> different implementations and parameter semantics for the same algorithm. I
> agree that it is important to keep this in mind. Hopefully, through this
> exercise, we will identify a good set of common ML abstractions across
> different frameworks.
>
> Feel free to edit the document. We had limited the first pass of the
> comparison matrix to the machine learning pipeline APIs, but we can extend
> it to include other ML building blocks like linear algebra operations, and
> APIs for optimizers like gradient descent.
>
> Soila
>
> -----Original Message-----
> From: Kam Kasravi [mailto:kamkasr...@gmail.com]
> Sent: Monday, May 16, 2016 8:22 AM
> To: dev@beam.incubator.apache.org
> Subject: Re: machine learning API, common models
>
> Thanks Simone - yes I had read your concerns on dev and I think they're
> well founded.
> Thanks for the samsura reference - I've been looking at the spark/scala
> bindings http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> .
>
> I think we should expand the document to include linear algebraic ops or
> least pay due diligence to it. If you're doing anything on the flink side
> in this regard let us or feel free to suggest edits/updates to the document.
>
> Thanks
> Kam
>
> On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> simone.robu...@radicalbit.io> wrote:
>
>> Hello,
>>
>> I'm Simone and I just began contributing to Flink ML (actually on the
>> distributed linalg part). I already expressed my concerns about the
>> idea of an high level API relying on specific frameworks' implementations:
>> different implementations produce different results and may vary in
>> quality. Also the semantics of parameters may change from one
>> implementation to the other. This could hinder portability and
>> transparency. I believe these problems could be handled paying the due
>> attention to the details of every single implementation but I invite
>> you not to underestimate these problems.
>>
>> On the other hand the API in itself looks good to me. From my side, I
>> hope to fill some of the gaps in Flink you underlined in the comparison
> matrix.
>> Talking about matrices, proper matrices this time, I believe it would
>> be useful to include in this API support for linear algebra operations.
>> Something similar is already present in Mahout's Samsara and it looks
>> really good but clearly a similar implementation on Beam would be way
>> more interesting and powerful.
>>
>> My 2 cents,
>>
>> Simone
>>
>>
>> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <soila.p.kavu...@intel.com>:
>>
>>> Hi Tyler,
>>>
>>> Thank you so much for your feedback. I agree that starting with the
>>> high-level API is a good direction. We are interested in Python
>>> because
>> it
>>> is the language that our data scientists are most familiar with. I
>>> think starting with Java would be the best approach, because the
>>> Python API can be a thin wrapper for Java API.
>>>
>>> In Spark, the Scala, Java and Python APIs are identical. Flink does
>>> not have a Python API for ML pipelines at present.
>>>
>>> Could you point me to the updated runner API?
>>>
>>> Soila
>>>
>>> -----Original Message-----
>>> From: Tyler Akidau [mailto:taki...@google.com.INVALID]
>>> Sent: Friday, May 13, 2016 6:34 PM
>>> To: dev@beam.incubator.apache.org
>>> Subject: Re: machine learning API, common models
>>>
>>> Hi Kam & Soila,
>>>
>>> Thanks a lot for writing this up. I ran the doc past some of the
>>> folks who've been doing ML work here at Google, and they were
>>> generally happy with the distillation of common methods in the doc.
>>> I'd be curious to
>> hear
>>> what folks on the Flink- and Spark- runner sides think.
>>>
>>> To me, this seems like a good direction for a high-level API.
>>> Presumably, once a high-level API is in place, we could begin
>>> looking at what it
>> would
>>> take to add lower-level ML algorithm support (e.g. iterative) to the
>>> Beam Model. Is this essentially what you're thinking?
>>>
>>> Some more specific questions/comments:
>>>
>>>     - Presumably you'd want to tackle this in Java first, since that's
> the
>>>     only language we currently support? Given that half of your
>>> examples are in
>>>     Python, I'm also assuming Python will be interesting once it's
>>> available.
>>>
>>>     - Along those lines, what languages are represented in the capability
>>>     matrix? E.g. is Spark ML support as detailed there identical across
>>>     Java/Scala and Python?
>>>
>>>     - Have you thought about how this would tie in at the runner level,
>>>     particularly given the updated Runner API changes that are coming?
> I'm
>>>     assuming they'd be provided as composite transforms that (for
>>> now)
>> would
>>>     have no default implementation, given the lack of low-level
>>> primitives for
>>>     ML algorithms, but am curious what your thoughts are there.
>>>
>>>     - I still don't fully understand how incremental updates due to model
>>>     drift would tie in at the API level. There's a comment thread in
>>> the
>> doc
>>>     still open tracking this, so no need to comment here additionally.
>> Just
>>>     pointing it out as one of the things that stands out as
>>> potentially having
>>>     API-level impacts to me that doesn't seem 100% fleshed out in the
>>> doc yet
>>>     (thought that admittedly may just be my limited understanding at
>>> this point
>>>     :-).
>>>
>>> -Tyler
>>>
>>>
>>>
>>>
>>> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <kamkasr...@gmail.com>
>> wrote:
>>>> Hi Tyler - my bad. Comments should be enabled now.
>>>>
>>>> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
>>>> <taki...@google.com.invalid
>>>> wrote:
>>>>
>>>>> Thanks a lot, Kam. Can you please enable comment access on the doc?
>>>>> I
>>>> seem
>>>>> to have view access only.
>>>>>
>>>>> -Tyler
>>>>>
>>>>> On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
>>>>> <kamkasr...@gmail.com>
>>>> wrote:
>>>>>> Hi
>>>>>>
>>>>>> A number of readers have made comments on this topic recently.
>>>>>> We have created a document that does some analysis of common
>>>>>> ML models and
>>>>> related
>>>>>> APIs. We hope this can drive an approach that will result in
>>>>>> an API, compatibility matrix and involvement from the same
>>>>>> groups that are implementing transformation runners (spark,
> flink, etc).
>>>>>> We welcome comments here or in the document itself.
>>>>>>
>>>>>>
>>>>>>
>>>> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
>>>> yjo4
>>>> PBECHb-xA/edit?usp=sharing

Re: Fwd: machine learning API, common models

Reply via email to