Re: Fwd: machine learning API, common models

2016-05-30 Thread Simone Robutti
+1

2016-05-27 17:18 GMT+02:00 Kam Kasravi <kamkasr...@gmail.com>:

> Hi Beam ML community
>
> Based on comments from a number of you and some discussion we've had here
> we thought we would suggest the following direction:
>
>- Begin with primitive operations common and critical to most all ML
>algorithms. These primitive operators would include:
>   - linear algebra operations - borrowing from established libraries
>   like samsara.
>   - iterative processing - also central to ML where replay of datasets
>   is easy to specific as well as thresholds or halting criteria. This
>   coordinates well with FlinkML's current approach and base API's.
>   - possibly new broadcast mechanisms not normally available within BSP
>   frameworks such as Beam.
>- Normalize dataset and parameters that differ across current major ML
>libraries that offer the same types of models.
>- Favor a native ML implementation rather than a thin wrapper in order
>to provide consistency across runners. This will also allow the Beam ML
> to
>maximize quality and consistency issues across runners.
>- Support for languages also supported in the Beam runners (java,
>python, scala).
>- Implement several common ML algorithms using the low level primitives
>on one of more available Runners to validate both the low level API's
> and
>possible improvements on the high level API.
>
> Skikit-learn pipelines and existing portable libraries like xgboost4j will
> be valuable to model the high-level APIs - for example how xgboost4j
> currently integrates with spark and flink.
>
> We welcome further comments and further refinements in approach.
>
> On Sun, May 22, 2016 at 7:43 PM, Henry Saputra <henry.sapu...@gmail.com>
> wrote:
>
> > @Frances:
> >
> > that would be probably the way to go IF we decide to have ML in Beam.
> >
> > @Simone:
> >
> > I am definitely love to see Beam introduce ML model APIs to abstract and
> > unifiy all "dataflow" runner frameworks, such as with Flink ML and Spark
> > ML.
> >
> > However, as you mentioned before, the target audience would be focus on
> > distributed or ML engineers as you have mentioned.
> > But I could see we have to then make some out of box ML algorithms (model
> > train and fine tune) in addition to test the model and APIs.
> >
> > The expectation would be that these models to be "production" ready, in
> > which most cases will be used by Data Scientists via some configurations,
> > since they won't and most can't use Java language.
> >
> > I would love to see instead more on integration with existing ML
> frameworks
> > like XGBoost [1], Mahout Samsara [2], or DL4J [3] for ML APIs and models
> in
> > Beam.
> >
> > Thoughts and comments are definitely welcomed =)
> >
> > - Henry
> >
> > [1] https://github.com/dmlc/xgboost
> > [2]
> https://mahout.apache.org/users/environment/out-of-core-reference.html
> > [3] http://deeplearning4j.org
> > <http://deeplearning4j.org/image-data-pipeline.html#record>
> >
> >
> > On Sat, May 21, 2016 at 2:01 AM, Simone Robutti <
> > simone.robu...@radicalbit.io> wrote:
> >
> > > I think these APIs won't be used by Data Scientists (R, Python) but by
> > > Machine Learning Engineers (Scala, Java or C++ in different
> environments)
> > > and as a ML Engineer it makes a lot of sense to me to have such an API
> if
> > > I'm using Beam. It would make a lot more sense to implement algorithms
> > > directly in Beam but that will come in the future, I hope.
> > >
> > > 2016-05-21 0:35 GMT+02:00 Henry Saputra <henry.sapu...@gmail.com>:
> > >
> > > > I am a bit concern about adding ML model APIs to Beam because the
> > > fluctuate
> > > > nature of ML landscape and also in reality, most data scientists tend
> > to
> > > > use Python and R most the work with existing model definition.
> > > >
> > > > Even though you could say something like Spark ML is popular, it is
> > > merely
> > > > because it is involving Apache Spark rather than quality of the ML
> > module
> > > > itself.
> > > >
> > > > The pipeline and most of the tooling are inspired by scikit-learn,
> and
> > > > hence it is relying on familiarity of the library to attract
> > developers.
> > > >
> > > > My question is whether fully end to end ML APIs is needed as part of
> > core
> > > > Beam 

Re: Will Beam provide a machine learning API in the future?

2016-04-20 Thread Simone Robutti
This would be an interesting feature. We are looking forward to develop ML
integrations on Beam and we are watching what's going on. The idea of a ML
may be interesting as an higher level API or as a proper ML library written
in Beam (pretty much what SAMOA does) but beware to offer a common layer
between different algorithmic implementation because the assumption that
they are consistent in nature and implementation is a big assumption and it
could lead to a lot of design problems for you and usability problems for
the end user.
Il 20/apr/2016 06:16 AM, "Jean-Baptiste Onofré"  ha
scritto:

> Hi Jianfreng
>
> As you can see in the "Technical Vision" document:
>
>
> https://drive.google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc=sharing
>
> I proposed "Machine Learning functions support".
>
> It's not the highest priority right now, but it's something that we plan.
>
> Regards
> JB
>
> On 04/20/2016 04:23 AM, Jianfeng Qian wrote:
>
>> Hi ,
>>
>> Machine learning become more and more popular today, mllib of Spark,
>> FlinkML and Google Cloud Machine Learning.
>>
>> Will Beam provide a machine learning API in the future?
>>
>> Will anyone have some interest in doing it?
>>
>>
>>
>> Best Regards,
>>
>> Jianfeng
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>