[jira] [Commented] (MAHOUT-1286) Memory-efficient DataModel, supporting fast online updates and element-wise iteration

Ted Dunning (JIRA) Mon, 12 Aug 2013 17:22:44 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737606#comment-13737606
 ]


Ted Dunning commented on MAHOUT-1286:
-------------------------------------

Recommendation as search is just one model.  I want to have a good demo of that 
available so that people can deploy recommenders very easily.

I am generally less enthusiastic about other forms of recommenders, but 
definitely not to the extent of thinking that Mahout should not support them.

Regarding your other questions,

1) yes, search engines can support real-time learning of some forms and can 
update during recommendation operations.

2) no search engines like Solr or Lucene only support models that can be 
sparsified.  You can build simple ensembles using complex queries, however.

3) (the question you didn't ask) one particular strength of recommendation as 
search is that it supports multi-model recommendation.


My own feeling is that getting basic recommendations up quickly allows more 
time to experiment with additional data sources and with alternative UI 
presentations, business rules and dithering.  These make a much larger 
difference in my experience than the basic recommendation algorithm so getting 
them up quickly and not wasting time on the algorithm itself is often 
warranted.  If you have time and engineers to spend after getting this very, 
very good, then going back to improve the algorithms can make good sense.  I 
have never seen a startup that had this time or these engineers.  Only a few 
large companies have them either.

So my strategy here is to facilitate early successes without endangering longer 
term optimizations.
                
> Memory-efficient DataModel, supporting fast online updates and element-wise 
> iteration
> -------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1286
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1286
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.9
>            Reporter: Peng Cheng
>            Assignee: Sean Owen
>              Labels: collaborative-filtering, datamodel, patch, recommender
>             Fix For: 0.9
>
>         Attachments: InMemoryDataModel.java, InMemoryDataModelTest.java
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Most DataModel implementation in current CF component use hash map to enable 
> fast 2d indexing and update. This is not memory-efficient for big data set. 
> e.g. Netflix prize dataset takes 11G heap space as a FileDataModel.
> Improved implementation of DataModel should use more compact data structure 
> (like arrays), this can trade a little of time complexity in 2d indexing for 
> vast improvement in memory efficiency. In addition, any online recommender or 
> online-to-batch converted recommender will not be affected by this in 
> training process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1286) Memory-efficient DataModel, supporting fast online updates and element-wise iteration

Reply via email to