[ 
https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791971#action_12791971
 ] 

Ankur commented on MAHOUT-103:
------------------------------

Ok, so here is the next version which I again re-wrote completely :-( for 
performance reasons. The version now computes item similarity and uses that to 
generate recommendations in truely hadoop fashion. In a nutshell the 
recommendations are generated in 2 steps:-

1. Join item-similarity data (generated via analyzing co-occurence) with 
user-click data
2. Group output of step 1. on user key so that we recieve all potential 
candidates for a user in a reducer and also all items already clicked/seen by 
him so that they can be excluded from final recommendations set.

Also attached 
1. Perl script to convert the netflix data into required format (userId \t 
movieId) 
2. Bash script used to run in on 50 node hadoop cluster. The recommendations 
are generated for all the users in less than 45 min.

> Co-occurence based nearest neighbourhood
> ----------------------------------------
>
>                 Key: MAHOUT-103
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-103
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Ankur
>            Assignee: Ankur
>         Attachments: jira-103.patch, mahout-103.patch.v1
>
>
> Nearest neighborhood type queries for users/items can be answered efficiently 
> and effectively by analyzing the co-occurrence model of a user/item w.r.t 
> another. This patch aims at providing an implementation for answering such 
> queries based upon simple co-occurrence counts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to