Re: Oryx + Spark mllib

Nick Pentreath Sun, 19 Oct 2014 08:39:06 -0700

Well, when I started development ~2 years ago, Scalatra just appealed more,
being more lightweight (I didn't need MVC just barebones REST endpoints),
and I still find its API / DSL much nicer to work with. Also, the swagger
API docs integration was important to me. So it's more familiarity than any
other reason.


If I were to build a model server from scratch perhaps Spray/Akka HTTP
would be the better way to go purely for integration purposes.

Having said that I think Scalatra is great and performant, so it's not a
no-brainer either way.

On Sun, Oct 19, 2014 at 5:29 PM, Debasish Das <debasish.da...@gmail.com>
wrote:

> Hi Nick,
>
> Any specific reason of choosing scalatra and not play/spray (now that they
> are getting integrated) ?
>
> Sean,
>
> Would you be interested in a play and akka clustering based module in
> oryx2 and see how it compares against the servlets ? I am interested to
> understand the scalability....
>
> Thanks.
> Deb
>
> On Sat, Oct 18, 2014 at 11:22 PM, Nick Pentreath <nick.pentre...@gmail.com
> > wrote:
>
>> We've built a model server internally, based on Scalatra and Akka
>> Clustering. Our use case is more geared towards serving possibly thousands
>> of smaller models.
>>
>> It's actually very basic, just reads models from S3 as strings (!!) (uses
>> HDFS FileSystem so can read from local, HDFS, S3) and uses Breeze for
>> linear algebra. (Technically it is also not dependent on Spark, it could be
>> reading models generated by any computation layer).
>>
>> It's designed to allow scaling via cluster sharding, by adding nodes (but
>> could also support a load-balanced approach). Not using persistent actors
>> as doing a model reload on node failure is not a disaster as we have
>> multiple levels of fallback.
>>
>> Currently it is a bit specific to our setup (and only focused on
>> recommendation models for now), but could with some work be made generic.
>> I'm certainly considering if we can find the time to make it a releasable
>> project.
>>
>> One major difference to Oryx is that it only handles the model loading
>> and vector computations, not the filtering-related and other things that
>> come as part of a recommender system (that is done elsewhere in our
>> system). It also does not handle the ingesting of data at all.
>>
>> On Sun, Oct 19, 2014 at 7:10 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> Yes, that is exactly what the next 2.x version does. Still in progress
>>> but
>>> the recommender app and framework are code - complete. It is not even
>>> specific to MLlib and could plug in other model build functions.
>>>
>>> The current 1.x version will not use MLlib. Neither uses Play but is
>>> intended to scale just by adding web servers however you usually do.
>>>
>>> See graphflow too.
>>> On Oct 18, 2014 5:06 PM, "Rajiv Abraham" <rajiv.abra...@gmail.com>
>>> wrote:
>>>
>>> > Oryx 2 seems to be geared for Spark
>>> >
>>> > https://github.com/OryxProject/oryx
>>> >
>>> > 2014-10-18 11:46 GMT-04:00 Debasish Das <debasish.da...@gmail.com>:
>>> >
>>> > > Hi,
>>> > >
>>> > > Is someone working on a project on integrating Oryx model serving
>>> layer
>>> > > with Spark ? Models will be built using either Streaming data / Batch
>>> > data
>>> > > in HDFS and cross validated with mllib APIs but the model serving
>>> layer
>>> > > will give API endpoints like Oryx
>>> > > and read the models may be from hdfs/impala/SparkSQL
>>> > >
>>> > > One of the requirement is that the API layer should be scalable and
>>> > > elastic...as requests grow we should be able to add more
>>> nodes...using
>>> > play
>>> > > and akka clustering module...
>>> > >
>>> > > If there is a ongoing project on github please point to it...
>>> > >
>>> > > Is there a plan of adding model serving and experimentation layer to
>>> > mllib
>>> > > ?
>>> > >
>>> > > Thanks.
>>> > > Deb
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Take care,
>>> > Rajiv
>>> >
>>>
>>
>>
>

Re: Oryx + Spark mllib

Reply via email to