Re: Content based recommender using lucene/solr

Lance Norskog Sat, 29 Jun 2013 17:51:09 -0700

Solr/Lucene has two features for this:
1) the MoreLikeThis code, and
2) the clustering project in solr/contrib.


Lance

On 06/28/2013 11:15 AM, Luis Carlos Guerrero Covo wrote:

I only have about a million docs right now so scaling is not a big issue.
I'm looking to provide a quick implementation and then worry about scale
when I get around to implementing a more robust recommender. I'm looking at
a content based approach because we are not tracking users and items viewed
by users. I was thinking of using morelikethis like walter mentioned, but
wanted some feedback on the nuances required for a proper implementation
like having a similarity based on euclidean distance, normalizing numerical
field values and computing collection wide stats like mean and variance.
Thank you for the link Otis, I will watch it right away.


On Fri, Jun 28, 2013 at 1:12 PM, Otis Gospodnetic <
[email protected]> wrote:

Hi,

It doesn't have to be one or the other.  In the past I've built a news
recommender engine based on CF (Mahout) and combined it with Content
Similarity-based engine (wasn't Solr/Lucene, but something custom that
worked with ngrams, but it may have as well been Lucene/Solr/ES).  It
worked well.  If you haven't worked with Mahout before I'd suggest the
approach in that video and going from there to Mahout only if it's
limiting.

See Ted's stuff on this topic, too:
http://www.slideshare.net/tdunning/search-as-recommendation +
http://berlinbuzzwords.de/sessions/multi-modal-recommendation-algorithms
(note: Mahout, Solr, Pig)

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 2:07 PM, Saikat Kanjilal <[email protected]>
wrote:

You could build a custom recommender in mahout to accomplish this, also

just out of curiosity why the content based approach as opposed to building
a recommender based on co-occurence.  One other thing, what is your data
size, are you looking at scale where you need something like hadoop?

From: [email protected]
Date: Fri, 28 Jun 2013 13:02:00 -0500
Subject: Re: Content based recommender using lucene/solr
To: [email protected]
CC: [email protected]

Hey saikat, thanks for your suggestion. I've looked into mahout and

other

alternatives for computing k nearest neighbors. I would have to run a

job

and computer the k nearest neighbors and track them in the index for
retrieval. I wanted to see if this was something I could do with lucene
using lucene's scoring function and solr's morelikethis component. The

job

you specifically mention is for Item based recommendation which would
require me to track the different items users have viewed. I'm looking

for

a content based approach where I would use a distance measure to

establish

how near items are (how similar) and have some kind of training phase to
adjust weights.


On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <[email protected]

wrote:

Why not just use mahout to do this, there is an item similarity

algorithm

in mahout that does exactly this :)

https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html

You can use mahout in distributed and non-distributed mode as well.

From: [email protected]
Date: Fri, 28 Jun 2013 12:16:57 -0500
Subject: Content based recommender using lucene/solr
To: [email protected]; [email protected]

Hi,

I'm using lucene and solr right now in a production environment

with an

index of about a million docs. I'm working on a recommender that

basically

would list the n most similar items to the user based on the

current item

he is viewing.

I've been thinking of using solr/lucene since I already have all

docs

available and I want a quick version that can be deployed while we

work

on

a more robust recommender. How about overriding the default

similarity so

that it scores documents based on the euclidean distance of

normalized

item

attributes and then using a morelikethis component to pass in the
attributes of the item for which I want to generate

recommendations? I

know

it has its issues like recomputing scores/normalization/weight

application

at query time which could make this idea unfeasible/impractical.

I'm at a

very preliminary stage right now with this and would love some

suggestions

from experienced users.

thank you,

Luis Guerrero



--
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Content based recommender using lucene/solr

Reply via email to