[jira] [Commented] (MAHOUT-1286) Memory-efficient DataModel, supporting fast online updates and element-wise iteration

Peng Cheng (JIRA) Mon, 12 Aug 2013 16:46:01 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737553#comment-13737553
 ]


Peng Cheng commented on MAHOUT-1286:
------------------------------------

Hi Gentlemen,

Thanks a lot for proving my point Gokhan, yeah I mean either user or item 
preferences extraction can be fast but not both.

Sorry I should have proposed it in our last hangout but I missed the invitation 
:-< But I tried to understand your proposal on recommendation-as-search.

>From what I heard on youtube, the new architecture is proposed as an easier 
>and faster replacement of all existing recommenders that take DataModel. Each 
>item is a weighted 'bag of words' generated by concurrence analysis/item 
>similarity on previous ratings. New users's ratings are converted into 
>weighted tuple of existing words and is matched with the items that have 
>highest sum of hits.

My concerns are that 1) does it support all type of recommenders and their 
ensemble? I know modern search engine like Google and YANDEX has a fairly 
complex ensemble search and ranking algorithm that looks similar to an ensemble 
recommender, but IMHO Lucene is built only for text search, not sure to what 
extend it is customizable. 2) does it support online learning? This feature is 
more important to SVDRecommender as a new user's recommendation is only known 
if this user is merged into the model. (Of course, an option is to project a 
new user into the user subspace by minimising its distance given its dot to 
existing items, but no body has test its performance before)
                
> Memory-efficient DataModel, supporting fast online updates and element-wise 
> iteration
> -------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1286
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1286
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.9
>            Reporter: Peng Cheng
>            Assignee: Sean Owen
>              Labels: collaborative-filtering, datamodel, patch, recommender
>             Fix For: 0.9
>
>         Attachments: InMemoryDataModel.java, InMemoryDataModelTest.java
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Most DataModel implementation in current CF component use hash map to enable 
> fast 2d indexing and update. This is not memory-efficient for big data set. 
> e.g. Netflix prize dataset takes 11G heap space as a FileDataModel.
> Improved implementation of DataModel should use more compact data structure 
> (like arrays), this can trade a little of time complexity in 2d indexing for 
> vast improvement in memory efficiency. In addition, any online recommender or 
> online-to-batch converted recommender will not be affected by this in 
> training process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1286) Memory-efficient DataModel, supporting fast online updates and element-wise iteration

Reply via email to