[
https://issues.apache.org/jira/browse/MAHOUT-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964229#comment-13964229
]
Pat Ferrel commented on MAHOUT-1422:
------------------------------------
Well I'm no Ted but for what it's worth...
Downsampling used to be done on the creation of the preference matrix as I
recall, then it was moved into RSJ. In the former case the downsampling could
be done separately per matrix. I think I mentioned this at the time but it
probably didn't sink in.
The case I'm looking at is a cross-recommender where the matrices will have
identical dimensions so downsampling each the same would probably be ok.
But there are other uses of an XRSJ that Ted will have to speak to. I believe
it's for things like combining search term clicks with user purchases to get
recs from search terms for items to purchase. This would mean different item
spaces for each matrix with identical user space (or the multiply wouldn't
work). Since the downsampling dimension could be vastly different on each I
imagine the 'ideal' is to allow each matrix it's own downsampling param.
The cross-recommender in the solr-recommender example uses matrix multiply
(instead of the XRSJ obviously) and since the downsampling was taken out of the
matrix creation there is none. But we don't know how well this works anyway so
it's somewhat a non-sequitur.
> Make a version of RSJ that uses two inputs
> ------------------------------------------
>
> Key: MAHOUT-1422
> URL: https://issues.apache.org/jira/browse/MAHOUT-1422
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 1.0
> Environment: mapreduce
> Reporter: Pat Ferrel
> Labels: recommender, similarity
> Fix For: 1.0
>
>
> Currently the RowSimiairtyJob uses a similarity measure to pairwise compare
> all rows in a DistributedRowMatrix.
> For many applications including a cross-action recommender we need something
> like RSJ that takes two DRMs and compares matching rows of each. The output
> would be the same form as RSJ, and ideally would allow the use of any
> similarity type already defined--especially LLR.
> There are two implementations of a Cross-Recommender one based on the Mahout
> RecommenderJob, and another based on Solr, that can immediately benefit from
> a Cross-RSJ.
> A modification of the matrix multiply job may be a place to start since the
> current RSJ seems to rely heavily if self-similarity.
--
This message was sent by Atlassian JIRA
(v6.2#6252)