Re: Software stack for Recommendation engine with spark mlib

2015-03-19 Thread Shashidhar Rao
Hi ,

Just 2 follow up questions, please suggest

1. Is there any commercial recommendation engine apart from the open source
tools(Mahout,Spark)  that are available that anybody can suggest ?


2. In this case only the purchase transaction is captured. There are no
ratings and no feedback available, no page views calculated by the
application, so in this case how far the recommendation engine will be
effective in recommending similar products to a user.
What are the features that should be available in order to create a robust
recommendation engine. e.g  product views etc, please kindly suggest a few
features that should be available.
Is purchase enough?

Thanks in advance.


On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen so...@cloudera.com wrote:

 I think you're assuming that you will pre-compute recommendations and
 store them in Mongo. That's one way to go, with certain tradeoffs. You
 can precompute offline easily, and serve results at large scale
 easily, but, you are forced to precompute everything -- lots of wasted
 effort, not completely up to date.

 The front-end part of the stack looks right.

 Spark would do the model building; you'd have to write a process to
 score recommendations and store the result. Mahout is the same thing,
 really.

 500K items isn't all that large. Your requirements aren't driven just
 by items though. Number of users and latent features matter too. It
 matters how often you want to build the model too. I'm guessing you
 would get away with a handful of modern machines for a problem this
 size.


 In a way what you describe reminds me of Wibidata, since it built
 recommender-like solutions on top of data and results published to a
 NoSQL store. You might glance at the related OSS project Kiji
 (http://kiji.org/) for ideas about how to manage the schema.

 You should have a look at things like Nick's architecture for
 Graphflow, however it's more concerned with computing recommendation
 on the fly, and describes a shift from an architecture originally
 built around something like a NoSQL store:

 http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf

 This is also the kind of ground the oryx project is intended to cover,
 something I've worked on personally:
 https://github.com/OryxProject/oryx   -- a layer on and around the
 core model building in Spark + Spark Streaming to provide a whole
 recommender (for example), down to the REST API.

 On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
 raoshashidhar...@gmail.com wrote:
  Hi,
 
  Can anyone who has developed recommendation engine suggest what could be
 the
  possible software stack for such an application.
 
  I am basically new to recommendation engine , I just found out Mahout and
  Spark Mlib which are available .
  I am thinking the below software stack.
 
  1. The user is going to use Android app.
  2.  Rest Api sent to app server from the android app to get
 recommendations.
  3. Spark Mlib core engine for recommendation engine
  4. MongoDB database backend.
 
  I would like to know more on the cluster configuration( how many nodes
 etc)
  part of spark for calculating the recommendations for 500,000 items. This
  items include products for day care etc.
 
  Other software stack suggestions would also be very useful.It has to run
 on
  multiple vendor machines.
 
  Please suggest.
 
  Thanks
  shashi



Re: Software stack for Recommendation engine with spark mlib

2015-03-15 Thread Shashidhar Rao
Thanks Sean, your suggestions and the links provided are just what I needed
to start off with.

On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen so...@cloudera.com wrote:

 I think you're assuming that you will pre-compute recommendations and
 store them in Mongo. That's one way to go, with certain tradeoffs. You
 can precompute offline easily, and serve results at large scale
 easily, but, you are forced to precompute everything -- lots of wasted
 effort, not completely up to date.

 The front-end part of the stack looks right.

 Spark would do the model building; you'd have to write a process to
 score recommendations and store the result. Mahout is the same thing,
 really.

 500K items isn't all that large. Your requirements aren't driven just
 by items though. Number of users and latent features matter too. It
 matters how often you want to build the model too. I'm guessing you
 would get away with a handful of modern machines for a problem this
 size.


 In a way what you describe reminds me of Wibidata, since it built
 recommender-like solutions on top of data and results published to a
 NoSQL store. You might glance at the related OSS project Kiji
 (http://kiji.org/) for ideas about how to manage the schema.

 You should have a look at things like Nick's architecture for
 Graphflow, however it's more concerned with computing recommendation
 on the fly, and describes a shift from an architecture originally
 built around something like a NoSQL store:

 http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf

 This is also the kind of ground the oryx project is intended to cover,
 something I've worked on personally:
 https://github.com/OryxProject/oryx   -- a layer on and around the
 core model building in Spark + Spark Streaming to provide a whole
 recommender (for example), down to the REST API.

 On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
 raoshashidhar...@gmail.com wrote:
  Hi,
 
  Can anyone who has developed recommendation engine suggest what could be
 the
  possible software stack for such an application.
 
  I am basically new to recommendation engine , I just found out Mahout and
  Spark Mlib which are available .
  I am thinking the below software stack.
 
  1. The user is going to use Android app.
  2.  Rest Api sent to app server from the android app to get
 recommendations.
  3. Spark Mlib core engine for recommendation engine
  4. MongoDB database backend.
 
  I would like to know more on the cluster configuration( how many nodes
 etc)
  part of spark for calculating the recommendations for 500,000 items. This
  items include products for day care etc.
 
  Other software stack suggestions would also be very useful.It has to run
 on
  multiple vendor machines.
 
  Please suggest.
 
  Thanks
  shashi



Re: Software stack for Recommendation engine with spark mlib

2015-03-15 Thread Sean Owen
I think you're assuming that you will pre-compute recommendations and
store them in Mongo. That's one way to go, with certain tradeoffs. You
can precompute offline easily, and serve results at large scale
easily, but, you are forced to precompute everything -- lots of wasted
effort, not completely up to date.

The front-end part of the stack looks right.

Spark would do the model building; you'd have to write a process to
score recommendations and store the result. Mahout is the same thing,
really.

500K items isn't all that large. Your requirements aren't driven just
by items though. Number of users and latent features matter too. It
matters how often you want to build the model too. I'm guessing you
would get away with a handful of modern machines for a problem this
size.


In a way what you describe reminds me of Wibidata, since it built
recommender-like solutions on top of data and results published to a
NoSQL store. You might glance at the related OSS project Kiji
(http://kiji.org/) for ideas about how to manage the schema.

You should have a look at things like Nick's architecture for
Graphflow, however it's more concerned with computing recommendation
on the fly, and describes a shift from an architecture originally
built around something like a NoSQL store:
http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf

This is also the kind of ground the oryx project is intended to cover,
something I've worked on personally:
https://github.com/OryxProject/oryx   -- a layer on and around the
core model building in Spark + Spark Streaming to provide a whole
recommender (for example), down to the REST API.

On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
raoshashidhar...@gmail.com wrote:
 Hi,

 Can anyone who has developed recommendation engine suggest what could be the
 possible software stack for such an application.

 I am basically new to recommendation engine , I just found out Mahout and
 Spark Mlib which are available .
 I am thinking the below software stack.

 1. The user is going to use Android app.
 2.  Rest Api sent to app server from the android app to get recommendations.
 3. Spark Mlib core engine for recommendation engine
 4. MongoDB database backend.

 I would like to know more on the cluster configuration( how many nodes etc)
 part of spark for calculating the recommendations for 500,000 items. This
 items include products for day care etc.

 Other software stack suggestions would also be very useful.It has to run on
 multiple vendor machines.

 Please suggest.

 Thanks
 shashi

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Software stack for Recommendation engine with spark mlib

2015-03-15 Thread Shashidhar Rao
Hi,

Can anyone who has developed recommendation engine suggest what could be
the possible software stack for such an application.

I am basically new to recommendation engine , I just found out Mahout and
Spark Mlib which are available .
I am thinking the below software stack.

1. The user is going to use Android app.
2.  Rest Api sent to app server from the android app to get recommendations.
3. Spark Mlib core engine for recommendation engine
4. MongoDB database backend.

I would like to know more on the cluster configuration( how many nodes etc)
part of spark for calculating the recommendations for 500,000 items. This
items include products for day care etc.

Other software stack suggestions would also be very useful.It has to run on
multiple vendor machines.

Please suggest.

Thanks
shashi


Re: Software stack for Recommendation engine with spark mlib

2015-03-15 Thread Shashidhar Rao
Thanks Nick, for your suggestions.

On Sun, Mar 15, 2015 at 10:41 PM, Nick Pentreath nick.pentre...@gmail.com
wrote:

 As Sean says, precomputing recommendations is pretty inefficient. Though
 with 500k items its easy to get all the item vectors in memory so
 pre-computing is not too bad.

 Still, since you plan to serve these via a REST service anyway, computing
 on demand via a serving layer such as Oryx or PredictionIO (or the newly
 open sourced Seldon.io) is a good option. You can also cache the
 recommendations quite aggressively - once you compute a user or item top-K
 list, just stick the result in mem cache / redis / whatever and evict it
 when you recompute your offline model, or every hour or whatever.


 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Sun, Mar 15, 2015 at 3:03 PM, Shashidhar Rao 
 raoshashidhar...@gmail.com wrote:

 Thanks Sean, your suggestions and the links provided are just what I
 needed to start off with.

 On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen so...@cloudera.com wrote:

 I think you're assuming that you will pre-compute recommendations and
 store them in Mongo. That's one way to go, with certain tradeoffs. You
 can precompute offline easily, and serve results at large scale
 easily, but, you are forced to precompute everything -- lots of wasted
 effort, not completely up to date.

 The front-end part of the stack looks right.

 Spark would do the model building; you'd have to write a process to
 score recommendations and store the result. Mahout is the same thing,
 really.

 500K items isn't all that large. Your requirements aren't driven just
 by items though. Number of users and latent features matter too. It
 matters how often you want to build the model too. I'm guessing you
 would get away with a handful of modern machines for a problem this
 size.


 In a way what you describe reminds me of Wibidata, since it built
 recommender-like solutions on top of data and results published to a
 NoSQL store. You might glance at the related OSS project Kiji
 (http://kiji.org/) for ideas about how to manage the schema.

 You should have a look at things like Nick's architecture for
 Graphflow, however it's more concerned with computing recommendation
 on the fly, and describes a shift from an architecture originally
 built around something like a NoSQL store:

 http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf

 This is also the kind of ground the oryx project is intended to cover,
 something I've worked on personally:
 https://github.com/OryxProject/oryx   -- a layer on and around the
 core model building in Spark + Spark Streaming to provide a whole
 recommender (for example), down to the REST API.

 On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
 raoshashidhar...@gmail.com wrote:
  Hi,
 
  Can anyone who has developed recommendation engine suggest what could
 be the
  possible software stack for such an application.
 
  I am basically new to recommendation engine , I just found out Mahout
 and
  Spark Mlib which are available .
  I am thinking the below software stack.
 
  1. The user is going to use Android app.
  2.  Rest Api sent to app server from the android app to get
 recommendations.
  3. Spark Mlib core engine for recommendation engine
  4. MongoDB database backend.
 
  I would like to know more on the cluster configuration( how many nodes
 etc)
  part of spark for calculating the recommendations for 500,000 items.
 This
  items include products for day care etc.
 
  Other software stack suggestions would also be very useful.It has to
 run on
  multiple vendor machines.
 
  Please suggest.
 
  Thanks
  shashi






Re: Software stack for Recommendation engine with spark mlib

2015-03-15 Thread Nick Pentreath
As Sean says, precomputing recommendations is pretty inefficient. Though with 
500k items its easy to get all the item vectors in memory so pre-computing is 
not too bad.




Still, since you plan to serve these via a REST service anyway, computing on 
demand via a serving layer such as Oryx or PredictionIO (or the newly open 
sourced Seldon.io) is a good option. You can also cache the recommendations 
quite aggressively - once you compute a user or item top-K list, just stick the 
result in mem cache / redis / whatever and evict it when you recompute your 
offline model, or every hour or whatever.






—
Sent from Mailbox

On Sun, Mar 15, 2015 at 3:03 PM, Shashidhar Rao
raoshashidhar...@gmail.com wrote:

 Thanks Sean, your suggestions and the links provided are just what I needed
 to start off with.
 On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen so...@cloudera.com wrote:
 I think you're assuming that you will pre-compute recommendations and
 store them in Mongo. That's one way to go, with certain tradeoffs. You
 can precompute offline easily, and serve results at large scale
 easily, but, you are forced to precompute everything -- lots of wasted
 effort, not completely up to date.

 The front-end part of the stack looks right.

 Spark would do the model building; you'd have to write a process to
 score recommendations and store the result. Mahout is the same thing,
 really.

 500K items isn't all that large. Your requirements aren't driven just
 by items though. Number of users and latent features matter too. It
 matters how often you want to build the model too. I'm guessing you
 would get away with a handful of modern machines for a problem this
 size.


 In a way what you describe reminds me of Wibidata, since it built
 recommender-like solutions on top of data and results published to a
 NoSQL store. You might glance at the related OSS project Kiji
 (http://kiji.org/) for ideas about how to manage the schema.

 You should have a look at things like Nick's architecture for
 Graphflow, however it's more concerned with computing recommendation
 on the fly, and describes a shift from an architecture originally
 built around something like a NoSQL store:

 http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf

 This is also the kind of ground the oryx project is intended to cover,
 something I've worked on personally:
 https://github.com/OryxProject/oryx   -- a layer on and around the
 core model building in Spark + Spark Streaming to provide a whole
 recommender (for example), down to the REST API.

 On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
 raoshashidhar...@gmail.com wrote:
  Hi,
 
  Can anyone who has developed recommendation engine suggest what could be
 the
  possible software stack for such an application.
 
  I am basically new to recommendation engine , I just found out Mahout and
  Spark Mlib which are available .
  I am thinking the below software stack.
 
  1. The user is going to use Android app.
  2.  Rest Api sent to app server from the android app to get
 recommendations.
  3. Spark Mlib core engine for recommendation engine
  4. MongoDB database backend.
 
  I would like to know more on the cluster configuration( how many nodes
 etc)
  part of spark for calculating the recommendations for 500,000 items. This
  items include products for day care etc.
 
  Other software stack suggestions would also be very useful.It has to run
 on
  multiple vendor machines.
 
  Please suggest.
 
  Thanks
  shashi