[ 
https://issues.apache.org/jira/browse/MAHOUT-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated MAHOUT-648:
--------------------------------------

    Attachment: MAHOUT-648.patch

> API-changes for optimizing recommender performance in some usecases
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-648
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-648
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>         Attachments: MAHOUT-648.patch
>
>
> I'd like to propose a set of small API changes in our recommender code. 
> * add a method *allSimilarItemIDs(long itemID)* to ItemSimilarity, which 
> returns the ids of all similar items
> * make sure that *GenericItemBasedRecommender.recommend(...)* only makes *a 
> single call to the DataModel* with which it retrieves all preferences for the 
> user to recommend items for
> * add *a new strategy for finding candidate items* for the most-similar-items 
> and recommendation computation that only calls 
> ItemSimilarity.allSimilarItemIDs(...) and doesn't need to call anything on 
> the DataModel
> * and an option to GenericItemSimilarity to make it create an in-memory-index 
> to allow *retrieval of all similar items per item in constant time*
> The purpose of these changes is to make it possible to run a very efficient 
> recommender for usecases, where the major purpose of the recommender is to 
> answer requests for most-similar-items and it you only have to compute "real" 
> recommendations from time to time. A typical scenario where these conditions 
> are met is e-commerce, you have lots of most-similar-items calls as users 
> browse product pages and fill their shopping carts and for the minority of 
> users that log in you have to provide personalized product recommendations.
> With the proposed changes, you need to precompute the item-similarities and 
> load them into memory, either from a file with FileItemSimilarity or from a 
> database with the new MySQLJDBCInMemoryItemSimilarity and use a 
> GenericItemBasedRecommender with the AllSimilarItemsCandidateItemsStrategy. 
> Requests for most-similar-items can be completely answered from memory (in 
> nearly constant time) without having to touch the DataModel. Answering 100 
> requests per second on a single machine are no problem using this approach.
> We can then use *a DataModel that does not need to reside in memory* because 
> its only task is to act as a repository for the users' preferences. When we 
> compute personalized recommendations we need to do exactly one single call to 
> the datastore to retrieve all the preferences for the user we wanna compute 
> recommendations for. This single call should be very fast with our already 
> existing jdbc-backed DataModel's and it should be easy to implement it 
> equally fast in other datastores like Solr for example. One could even start 
> thinking about sharded DataModels with this approach.
> Another very big advantage of this approach is that *user preferences can now 
> be updated in realtime* as we never need to refresh the datamodel. We only 
> need to refresh the item-similarities from time to time. Memory requirements 
> for the recommender machines would drop drastically as we *only have to store 
> the item-similarities in RAM* whose number should be orders of magnitude 
> smaller than the number of preferences.
> The API changes in the patch should be fully backwards compatible, so that 
> this new approach is only an additional way to use our recommender code and 
> all currently existing approaches still work as before.
> Here is an example how such a setup would work using a MySQL database:
> {noformat}
> DataSource dataSource = ...
> DataModel dataModel = new MySQLJDBCDataModel(dataSource);
> /* load all item-similarities into memory, create an index for fast retrieval 
> of all-similar-item-ids */
> ItemSimilarity itemSimilarity = MySQLJDBCInMemoryItemSimilarity(dataSource, 
> true);
> /* the candidate items for recommendation and most-similar-items are only 
> fetched from our in-memory data structures by this strategy*/
> AllSimilarItemsCandidateItemsStrategy allSimilarItemsStrategy = new 
> AllSimilarItemsCandidateItemsStrategy(itemSimilarity);
> ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel, 
> itemSimilarity, allSimilarItemsStrategy, allSimilarItemsStrategy);
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to