My own feeling is that we need to get some sort of recommender that supports
side information, possibly also as a classifier.

As everybody knows, I have been lately quite enamored of Menon and Elkan's
paper on Latent Factor Log-Linear models.  It seems
to subsume most other factorization methods and supports side data very
naturally.  Training is reportedly very fast using SGD
techniques.

The paper is here: http://arxiv.org/abs/1006.2156

On Mon, Oct 4, 2010 at 7:03 AM, Sebastian Schelter <[email protected]> wrote:

> Hi,
>
> The amount of work that is currently put in finishing 0.4 is amazing, I can
> hardly follow all the mails, very cool to see that. I've had some time today
> to write down ideas of features I have for version 0.5 and want to share it
> here for feedback.
>
> First I can think of possible new features for RecommenderJob
>
>  * add an option that makes the RecommenderJob use the output of the
> related
>    o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob instead of
> computing
>    the similarities again each time, this will give users the possibility
> to
>    choose the interval in which to precompute the item similarities
>
>  * add an option to make the RecommenderJob include "recommended because
> of"
>    items to each recommended item (analogous to what is already available
> at
>    GenericItemBasedRecommender.recommendedBecause(...)), showing this to
> users
>    helps them understand why some item was recommended to them
>
>
> Second I'd like Mahout to have a Map/Reduce implementation of the algorithm
> described in Y. Zhou et al.: "Large-scale Parallel Collaborative Filtering
> for the Netflix Prize" (http://bit.ly/cUPgqr).
>
> Here R is the matrix of ratings of users towards movies and each user and
> each movie is projected on a "feature" space (the number of features is
> defined before) so that the product of the resulting matrices U and M is a
> low-rank approximization/factorization of R.
>
> Determining U and M is mathematically modelled as an optimization problem
> and additionally some regularization is applied to avoid overfitting to the
> known entries. This problem is solved with an iterative approach called
> alternate least squares (ALS).
>
> If I understand the paper correctly this approach is easily parallelizable.
> In order to estimate an user feature vector you need only access to all his
> ratings and the feature vectors of all movies he/she rated. To estimate a
> movie feature vector you need access to all its ratings and to the feature
> vectors of the users who rated it.
>
> An unknown preference can then be predicted by computing the dot product of
> the according user and movie feature vectors.
>
> Would be very nice if someone who is familiar with the paper or has the
> time for a brief look into it could validate that, cause I don't fully trust
> my mathematical analysis.
>
> I already created a first prototype implementation but I definitely need
> help from someone checking it conceptually, optimizing the math related
> parts and help me test ist. Maybe that could be an interesting task for the
> upcoming Mahout hackathon in Berlin.
>
> --sebastian
>
> PS: @isabel I won't make it to the dinner today, need to rehearse my
> talk...
>

Reply via email to