[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791971#action_12791971 ]
Ankur commented on MAHOUT-103: ------------------------------ Ok, so here is the next version which I again re-wrote completely :-( for performance reasons. The version now computes item similarity and uses that to generate recommendations in truely hadoop fashion. In a nutshell the recommendations are generated in 2 steps:- 1. Join item-similarity data (generated via analyzing co-occurence) with user-click data 2. Group output of step 1. on user key so that we recieve all potential candidates for a user in a reducer and also all items already clicked/seen by him so that they can be excluded from final recommendations set. Also attached 1. Perl script to convert the netflix data into required format (userId \t movieId) 2. Bash script used to run in on 50 node hadoop cluster. The recommendations are generated for all the users in less than 45 min. > Co-occurence based nearest neighbourhood > ---------------------------------------- > > Key: MAHOUT-103 > URL: https://issues.apache.org/jira/browse/MAHOUT-103 > Project: Mahout > Issue Type: New Feature > Components: Collaborative Filtering > Reporter: Ankur > Assignee: Ankur > Attachments: jira-103.patch, mahout-103.patch.v1 > > > Nearest neighborhood type queries for users/items can be answered efficiently > and effectively by analyzing the co-occurrence model of a user/item w.r.t > another. This patch aims at providing an implementation for answering such > queries based upon simple co-occurrence counts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.