Solr/Lucene has two features for this:
1) the MoreLikeThis code, and
2) the clustering project in solr/contrib.

Lance

On 06/28/2013 11:15 AM, Luis Carlos Guerrero Covo wrote:
I only have about a million docs right now so scaling is not a big issue.
I'm looking to provide a quick implementation and then worry about scale
when I get around to implementing a more robust recommender. I'm looking at
a content based approach because we are not tracking users and items viewed
by users. I was thinking of using morelikethis like walter mentioned, but
wanted some feedback on the nuances required for a proper implementation
like having a similarity based on euclidean distance, normalizing numerical
field values and computing collection wide stats like mean and variance.
Thank you for the link Otis, I will watch it right away.


On Fri, Jun 28, 2013 at 1:12 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

Hi,

It doesn't have to be one or the other.  In the past I've built a news
recommender engine based on CF (Mahout) and combined it with Content
Similarity-based engine (wasn't Solr/Lucene, but something custom that
worked with ngrams, but it may have as well been Lucene/Solr/ES).  It
worked well.  If you haven't worked with Mahout before I'd suggest the
approach in that video and going from there to Mahout only if it's
limiting.

See Ted's stuff on this topic, too:
http://www.slideshare.net/tdunning/search-as-recommendation +
http://berlinbuzzwords.de/sessions/multi-modal-recommendation-algorithms
(note: Mahout, Solr, Pig)

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 2:07 PM, Saikat Kanjilal <sxk1...@hotmail.com>
wrote:
You could build a custom recommender in mahout to accomplish this, also
just out of curiosity why the content based approach as opposed to building
a recommender based on co-occurence.  One other thing, what is your data
size, are you looking at scale where you need something like hadoop?
From: lcguerreroc...@gmail.com
Date: Fri, 28 Jun 2013 13:02:00 -0500
Subject: Re: Content based recommender using lucene/solr
To: solr-u...@lucene.apache.org
CC: java-user@lucene.apache.org

Hey saikat, thanks for your suggestion. I've looked into mahout and
other
alternatives for computing k nearest neighbors. I would have to run a
job
and computer the k nearest neighbors and track them in the index for
retrieval. I wanted to see if this was something I could do with lucene
using lucene's scoring function and solr's morelikethis component. The
job
you specifically mention is for Item based recommendation which would
require me to track the different items users have viewed. I'm looking
for
a content based approach where I would use a distance measure to
establish
how near items are (how similar) and have some kind of training phase to
adjust weights.


On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <sxk1...@hotmail.com
wrote:
Why not just use mahout to do this, there is an item similarity
algorithm
in mahout that does exactly this :)



https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
You can use mahout in distributed and non-distributed mode as well.

From: lcguerreroc...@gmail.com
Date: Fri, 28 Jun 2013 12:16:57 -0500
Subject: Content based recommender using lucene/solr
To: solr-u...@lucene.apache.org; java-user@lucene.apache.org

Hi,

I'm using lucene and solr right now in a production environment
with an
index of about a million docs. I'm working on a recommender that
basically
would list the n most similar items to the user based on the
current item
he is viewing.

I've been thinking of using solr/lucene since I already have all
docs
available and I want a quick version that can be deployed while we
work
on
a more robust recommender. How about overriding the default
similarity so
that it scores documents based on the euclidean distance of
normalized
item
attributes and then using a morelikethis component to pass in the
attributes of the item for which I want to generate
recommendations? I
know
it has its issues like recomputing scores/normalization/weight
application
at query time which could make this idea unfeasible/impractical.
I'm at a
very preliminary stage right now with this and would love some
suggestions
from experienced users.

thank you,

Luis Guerrero



--
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to