from:"Shashidhar Rao"

Research ideas using spark

2015-07-13 Thread Shashidhar Rao

Hi,

I am doing my PHD thesis on large scale machine learning e.g  Online
learning, batch and mini batch learning.

Could somebody help me with ideas especially in the context of Spark and to
the above learning methods.

Some ideas like improvement to existing algorithms, implementing new
features especially the above learning methods and algorithms that have not
been implemented etc.

If somebody could help me with some ideas it would really accelerate my
work.

Plus few ideas on research papers regarding Spark or Mahout.

Thanks in advance.

Regards

Model deployment help

2015-03-21 Thread Shashidhar Rao

Hi,

Apologies for the generic question.

As I am developing predictive models for the first time and soon model will
be deployed in production very soon.

Could somebody help me with the  model deployment in production , I have
read quite a few on model deployment and have read some books on Database
deployment .

My queries relate to how  updates to model happen when current model
degenerates without any downtime and how others are deploying in production
servers and a few lines on adoption of PMML currently in production.

Please provide me with some good links  or some forums  so that I can learn
as most of the books do not cover it extensively except for 'Mahout in
action' where it is explained in some detail and have also checked
stackoverflow but have not got any relevant answers.

What I understand:
1. Build model using current training set and test the model.
2. Deploy the model,put it in some location and load it and predict when
request comes for scoring.
3. Model degenerates , now build new model with new data.(Here some
confusion , whether the old data is discarded completely or it is done with
purely new data or a mix)
4. Here I am stuck , how to update the model without any downtime, the
transition period when old model and new model happens.

My naive solution would be, build the new model , save it in a new location
and update the new path in some properties file or update the location in
database when the saving is done. Is this correct or some best practices
are available.
Database is unlikely in my case.

Thanks in advance.

Re: Software stack for Recommendation engine with spark mlib

2015-03-19 Thread Shashidhar Rao

Hi ,

Just 2 follow up questions, please suggest

1. Is there any commercial recommendation engine apart from the open source
tools(Mahout,Spark)  that are available that anybody can suggest ?


2. In this case only the purchase transaction is captured. There are no
ratings and no feedback available, no page views calculated by the
application, so in this case how far the recommendation engine will be
effective in recommending similar products to a user.
What are the features that should be available in order to create a robust
recommendation engine. e.g  product views etc, please kindly suggest a few
features that should be available.
Is purchase enough?

Thanks in advance.


On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen  wrote:

> I think you're assuming that you will pre-compute recommendations and
> store them in Mongo. That's one way to go, with certain tradeoffs. You
> can precompute offline easily, and serve results at large scale
> easily, but, you are forced to precompute everything -- lots of wasted
> effort, not completely up to date.
>
> The front-end part of the stack looks right.
>
> Spark would do the model building; you'd have to write a process to
> score recommendations and store the result. Mahout is the same thing,
> really.
>
> 500K items isn't all that large. Your requirements aren't driven just
> by items though. Number of users and latent features matter too. It
> matters how often you want to build the model too. I'm guessing you
> would get away with a handful of modern machines for a problem this
> size.
>
>
> In a way what you describe reminds me of Wibidata, since it built
> recommender-like solutions on top of data and results published to a
> NoSQL store. You might glance at the related OSS project Kiji
> (http://kiji.org/) for ideas about how to manage the schema.
>
> You should have a look at things like Nick's architecture for
> Graphflow, however it's more concerned with computing recommendation
> on the fly, and describes a shift from an architecture originally
> built around something like a NoSQL store:
>
> http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf
>
> This is also the kind of ground the oryx project is intended to cover,
> something I've worked on personally:
> https://github.com/OryxProject/oryx   -- a layer on and around the
> core model building in Spark + Spark Streaming to provide a whole
> recommender (for example), down to the REST API.
>
> On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
>  wrote:
> > Hi,
> >
> > Can anyone who has developed recommendation engine suggest what could be
> the
> > possible software stack for such an application.
> >
> > I am basically new to recommendation engine , I just found out Mahout and
> > Spark Mlib which are available .
> > I am thinking the below software stack.
> >
> > 1. The user is going to use Android app.
> > 2.  Rest Api sent to app server from the android app to get
> recommendations.
> > 3. Spark Mlib core engine for recommendation engine
> > 4. MongoDB database backend.
> >
> > I would like to know more on the cluster configuration( how many nodes
> etc)
> > part of spark for calculating the recommendations for 500,000 items. This
> > items include products for day care etc.
> >
> > Other software stack suggestions would also be very useful.It has to run
> on
> > multiple vendor machines.
> >
> > Please suggest.
> >
> > Thanks
> > shashi
>

Re: Software stack for Recommendation engine with spark mlib

2015-03-15 Thread Shashidhar Rao

Thanks Nick, for your suggestions.

On Sun, Mar 15, 2015 at 10:41 PM, Nick Pentreath 
wrote:

> As Sean says, precomputing recommendations is pretty inefficient. Though
> with 500k items its easy to get all the item vectors in memory so
> pre-computing is not too bad.
>
> Still, since you plan to serve these via a REST service anyway, computing
> on demand via a serving layer such as Oryx or PredictionIO (or the newly
> open sourced Seldon.io) is a good option. You can also cache the
> recommendations quite aggressively - once you compute a user or item top-K
> list, just stick the result in mem cache / redis / whatever and evict it
> when you recompute your offline model, or every hour or whatever.
>
>
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Sun, Mar 15, 2015 at 3:03 PM, Shashidhar Rao <
> raoshashidhar...@gmail.com> wrote:
>
>> Thanks Sean, your suggestions and the links provided are just what I
>> needed to start off with.
>>
>> On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen  wrote:
>>
>>> I think you're assuming that you will pre-compute recommendations and
>>> store them in Mongo. That's one way to go, with certain tradeoffs. You
>>> can precompute offline easily, and serve results at large scale
>>> easily, but, you are forced to precompute everything -- lots of wasted
>>> effort, not completely up to date.
>>>
>>> The front-end part of the stack looks right.
>>>
>>> Spark would do the model building; you'd have to write a process to
>>> score recommendations and store the result. Mahout is the same thing,
>>> really.
>>>
>>> 500K items isn't all that large. Your requirements aren't driven just
>>> by items though. Number of users and latent features matter too. It
>>> matters how often you want to build the model too. I'm guessing you
>>> would get away with a handful of modern machines for a problem this
>>> size.
>>>
>>>
>>> In a way what you describe reminds me of Wibidata, since it built
>>> recommender-like solutions on top of data and results published to a
>>> NoSQL store. You might glance at the related OSS project Kiji
>>> (http://kiji.org/) for ideas about how to manage the schema.
>>>
>>> You should have a look at things like Nick's architecture for
>>> Graphflow, however it's more concerned with computing recommendation
>>> on the fly, and describes a shift from an architecture originally
>>> built around something like a NoSQL store:
>>>
>>> http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf
>>>
>>> This is also the kind of ground the oryx project is intended to cover,
>>> something I've worked on personally:
>>> https://github.com/OryxProject/oryx   -- a layer on and around the
>>> core model building in Spark + Spark Streaming to provide a whole
>>> recommender (for example), down to the REST API.
>>>
>>> On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
>>>  wrote:
>>> > Hi,
>>> >
>>> > Can anyone who has developed recommendation engine suggest what could
>>> be the
>>> > possible software stack for such an application.
>>> >
>>> > I am basically new to recommendation engine , I just found out Mahout
>>> and
>>> > Spark Mlib which are available .
>>> > I am thinking the below software stack.
>>> >
>>> > 1. The user is going to use Android app.
>>> > 2.  Rest Api sent to app server from the android app to get
>>> recommendations.
>>> > 3. Spark Mlib core engine for recommendation engine
>>> > 4. MongoDB database backend.
>>> >
>>> > I would like to know more on the cluster configuration( how many nodes
>>> etc)
>>> > part of spark for calculating the recommendations for 500,000 items.
>>> This
>>> > items include products for day care etc.
>>> >
>>> > Other software stack suggestions would also be very useful.It has to
>>> run on
>>> > multiple vendor machines.
>>> >
>>> > Please suggest.
>>> >
>>> > Thanks
>>> > shashi
>>>
>>
>>
>

Re: Software stack for Recommendation engine with spark mlib

2015-03-15 Thread Shashidhar Rao

Thanks Sean, your suggestions and the links provided are just what I needed
to start off with.

On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen  wrote:

> I think you're assuming that you will pre-compute recommendations and
> store them in Mongo. That's one way to go, with certain tradeoffs. You
> can precompute offline easily, and serve results at large scale
> easily, but, you are forced to precompute everything -- lots of wasted
> effort, not completely up to date.
>
> The front-end part of the stack looks right.
>
> Spark would do the model building; you'd have to write a process to
> score recommendations and store the result. Mahout is the same thing,
> really.
>
> 500K items isn't all that large. Your requirements aren't driven just
> by items though. Number of users and latent features matter too. It
> matters how often you want to build the model too. I'm guessing you
> would get away with a handful of modern machines for a problem this
> size.
>
>
> In a way what you describe reminds me of Wibidata, since it built
> recommender-like solutions on top of data and results published to a
> NoSQL store. You might glance at the related OSS project Kiji
> (http://kiji.org/) for ideas about how to manage the schema.
>
> You should have a look at things like Nick's architecture for
> Graphflow, however it's more concerned with computing recommendation
> on the fly, and describes a shift from an architecture originally
> built around something like a NoSQL store:
>
> http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf
>
> This is also the kind of ground the oryx project is intended to cover,
> something I've worked on personally:
> https://github.com/OryxProject/oryx   -- a layer on and around the
> core model building in Spark + Spark Streaming to provide a whole
> recommender (for example), down to the REST API.
>
> On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
>  wrote:
> > Hi,
> >
> > Can anyone who has developed recommendation engine suggest what could be
> the
> > possible software stack for such an application.
> >
> > I am basically new to recommendation engine , I just found out Mahout and
> > Spark Mlib which are available .
> > I am thinking the below software stack.
> >
> > 1. The user is going to use Android app.
> > 2.  Rest Api sent to app server from the android app to get
> recommendations.
> > 3. Spark Mlib core engine for recommendation engine
> > 4. MongoDB database backend.
> >
> > I would like to know more on the cluster configuration( how many nodes
> etc)
> > part of spark for calculating the recommendations for 500,000 items. This
> > items include products for day care etc.
> >
> > Other software stack suggestions would also be very useful.It has to run
> on
> > multiple vendor machines.
> >
> > Please suggest.
> >
> > Thanks
> > shashi
>

Software stack for Recommendation engine with spark mlib

2015-03-15 Thread Shashidhar Rao

Hi,

Can anyone who has developed recommendation engine suggest what could be
the possible software stack for such an application.

I am basically new to recommendation engine , I just found out Mahout and
Spark Mlib which are available .
I am thinking the below software stack.

1. The user is going to use Android app.
2.  Rest Api sent to app server from the android app to get recommendations.
3. Spark Mlib core engine for recommendation engine
4. MongoDB database backend.

I would like to know more on the cluster configuration( how many nodes etc)
part of spark for calculating the recommendations for 500,000 items. This
items include products for day care etc.

Other software stack suggestions would also be very useful.It has to run on
multiple vendor machines.

Please suggest.

Thanks
shashi

Research ideas using spark

Model deployment help

Re: Software stack for Recommendation engine with spark mlib

Re: Software stack for Recommendation engine with spark mlib

Re: Software stack for Recommendation engine with spark mlib

Software stack for Recommendation engine with spark mlib

6 matches

Site Navigation

Mail list logo

Footer information