[
https://issues.apache.org/jira/browse/MAHOUT-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Schelter updated MAHOUT-648:
--------------------------------------
Status: Patch Available (was: Open)
> API-changes for optimizing recommender performance in some usecases
> -------------------------------------------------------------------
>
> Key: MAHOUT-648
> URL: https://issues.apache.org/jira/browse/MAHOUT-648
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Affects Versions: 0.5
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Attachments: MAHOUT-648.patch
>
>
> I'd like to propose a set of small API changes in our recommender code.
> * add a method *allSimilarItemIDs(long itemID)* to ItemSimilarity, which
> returns the ids of all similar items
> * make sure that *GenericItemBasedRecommender.recommend(...)* only makes *a
> single call to the DataModel* with which it retrieves all preferences for the
> user to recommend items for
> * add *a new strategy for finding candidate items* for the most-similar-items
> and recommendation computation that only calls
> ItemSimilarity.allSimilarItemIDs(...) and doesn't need to call anything on
> the DataModel
> * and an option to GenericItemSimilarity to make it create an in-memory-index
> to allow *retrieval of all similar items per item in constant time*
> The purpose of these changes is to make it possible to run a very efficient
> recommender for usecases, where the major purpose of the recommender is to
> answer requests for most-similar-items and it you only have to compute "real"
> recommendations from time to time. A typical scenario where these conditions
> are met is e-commerce, you have lots of most-similar-items calls as users
> browse product pages and fill their shopping carts and for the minority of
> users that log in you have to provide personalized product recommendations.
> With the proposed changes, you need to precompute the item-similarities and
> load them into memory, either from a file with FileItemSimilarity or from a
> database with the new MySQLJDBCInMemoryItemSimilarity and use a
> GenericItemBasedRecommender with the AllSimilarItemsCandidateItemsStrategy.
> Requests for most-similar-items can be completely answered from memory (in
> nearly constant time) without having to touch the DataModel. Answering 100
> requests per second on a single machine are no problem using this approach.
> We can then use *a DataModel that does not need to reside in memory* because
> its only task is to act as a repository for the users' preferences. When we
> compute personalized recommendations we need to do exactly one single call to
> the datastore to retrieve all the preferences for the user we wanna compute
> recommendations for. This single call should be very fast with our already
> existing jdbc-backed DataModel's and it should be easy to implement it
> equally fast in other datastores like Solr for example. One could even start
> thinking about sharded DataModels with this approach.
> Another very big advantage of this approach is that *user preferences can now
> be updated in realtime* as we never need to refresh the datamodel. We only
> need to refresh the item-similarities from time to time. Memory requirements
> for the recommender machines would drop drastically as we *only have to store
> the item-similarities in RAM* whose number should be orders of magnitude
> smaller than the number of preferences.
> The API changes in the patch should be fully backwards compatible, so that
> this new approach is only an additional way to use our recommender code and
> all currently existing approaches still work as before.
> Here is an example how such a setup would work using a MySQL database:
> {noformat}
> DataSource dataSource = ...
> DataModel dataModel = new MySQLJDBCDataModel(dataSource);
> /* load all item-similarities into memory, create an index for fast retrieval
> of all-similar-item-ids */
> ItemSimilarity itemSimilarity = MySQLJDBCInMemoryItemSimilarity(dataSource,
> true);
> /* the candidate items for recommendation and most-similar-items are only
> fetched from our in-memory data structures by this strategy*/
> AllSimilarItemsCandidateItemsStrategy allSimilarItemsStrategy = new
> AllSimilarItemsCandidateItemsStrategy(itemSimilarity);
> ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel,
> itemSimilarity, allSimilarItemsStrategy, allSimilarItemsStrategy);
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira