Re: Getting Taste to work on 10M dataset

Vinicius Carvalho Fri, 29 Jan 2010 18:50:54 -0800

On Sat, Jan 30, 2010 at 12:40 AM, Sean Owen <[email protected]> wrote:


> On Sat, Jan 30, 2010 at 2:34 AM, Vinicius Carvalho
> <[email protected]> wrote:
> > I'm trying the 5.1.10 the latest one available at maven repositories,
> > running it right now, since it takes a while, I'll inform of the results
>
> OK but this would be something you can check in your table right now.
> No columns should be nullable, or have nulls. If they do, that's the
> problem.
>
> Checked the DB there's no null columns. Also, the table was built using the
sample CREATE TABLE provided at the source code

 * CREATE TABLE taste_preferences (
 *   user_id BIGINT NOT NULL,
 *   item_id BIGINT NOT NULL,
 *   preference FLOAT NOT NULL,
 *   PRIMARY KEY (user_id, item_id),
 *   INDEX (user_id),
 *   INDEX (item_id)
 * )

>
> > At first I'm just creating the slopeonerecommender. did not even get to
> the
> > actual code, all that time is used on the construction of the object
>
> OK then it's the time spent in building diffs.
>
>
> > You mean for the DiffStorage right? The datamodel would be good to be at
> > JDBC right? I'm interested in item2item recommendations. I did this
> before
>
> For both, 10M ratings isn't terribly big. I think you can get it into
> memory in 2GB, plus the diffs, if you cap the number of diffs at some
> reasonable value.
>
> Only problem with in memory for datamodel would be volatility I guess.


>  > using taste by hand by computing the SimilarityMatrix and storing it on
> DB.
> > (I used as reference the book Collective Intelligence in action) and it
> > worked fine. Just the Similarity Matrix took a while to be recalculated
> by
> > it was a batch job running every hour. After that computing
> recomendations
> > was just a breeze.
>
> You mean you are interested in item-based recommenders, or
> recommending items to other items?
>

that would be item based recommenders

>
> Slope-one wouldn't have anything to do with item-item similarities, it
> works a bit differently. yes you could pre-compute similarities and
> use them with a custom ItemSimilarity implementation which reads from
> a DB table, and use that with GenericItemBasedRecommender.
>
> You could also do the similarity calculations with something like
> PearsonCorrelationSimilarity, and store that in the DB, and proceed
> with the above. Again, you'd have to write a little code but pretty
> easy.
>
> Or you could skip the DB altogether and let it compute item-item
> similarities on the fly.
>

I'll try your ideas, and post the results. Thanks for all the help Sean

-- 
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.

Re: Getting Taste to work on 10M dataset

Reply via email to