[jira] [Commented] (MAHOUT-610) Not all Coocurrences provided to SimilarityReducer

Sebastian Schelter (JIRA) Tue, 22 Mar 2011 06:15:48 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009635#comment-13009635
 ]


Sebastian Schelter commented on MAHOUT-610:
-------------------------------------------

yes it's fixed

> Not all Coocurrences provided to SimilarityReducer
> --------------------------------------------------
>
>                 Key: MAHOUT-610
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-610
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering, Math
>            Reporter: Joris Geessels
>            Assignee: Sebastian Schelter
>             Fix For: 0.5
>
>         Attachments: mahout-610.patch
>
>
> While doing some tests with the RecommenderJob, and more specifically the 
> RowSimilarityJob, I noticed that in some cases not all cooccurences are used 
> in the similarity calculations ( done in the SimilarityReducer class ).
> A RowPair object with (RowA=1,RowB=2) isn't considered the same as 
> (RowA=2,RowB=1). This causes problems as CoocurencesMapper sometimes emits 
> rowpairs in the first form and sometimes in the second form thus separating 
> the cooccurences. If I'm right, this is due to the fact that ordering of the 
> WeightedCoocurrenceArray for one column isn't guaranteed to be the same as 
> for another column.
> The solution is very simple, either you can change the compare method of the 
> RowPair class or you can adapt the CooccurencesMapper to enforce that RowA < 
> RowB.
> Hope I've not missed something obvious, and that this is intended behavior. 
> If this is the case, please enlighten me :-)
> Also, slightly off topic. While doing these tests, I've noticed that the 
> predictions are all remarkably high and the RMSE on the movielens 100k 
> dataset lies around 1,6.
> A bit to high if you ask me. Are these normal values or am I doing something 
> wrong?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-610) Not all Coocurrences provided to SimilarityReducer

Reply via email to