Peng, This is the reason I separated out the DataModel, and only put the learner stuff there. The learner I mentioned yesterday just stores the parameters, (noOfUsers+noOfItems)*noOfLatentFactors, and does not care where preferences are stored.
I, kind of, agree with the multi-level DataModel approach: One for iterating over "all" preferences, one for if one wants to deploy a recommender and perform a lot of top-N recommendation tasks. (Or one DataModel with a strategy that might reduce existing memory consumption, while still providing fast access, I am not sure. Let me try a matrix-backed DataModel approach) Gokhan On Tue, Jul 16, 2013 at 9:51 PM, Sebastian Schelter <s...@apache.org> wrote: > I completely agree, Netflix is less than one gigabye in a smart > representation, 12x more memory is a nogo. The techniques used in > FactorizablePreferences allow a much more memory efficient representation, > tested on KDD Music dataset which is approx 2.5 times Netflix and fits into > 3GB with that approach. > > > 2013/7/16 Ted Dunning <ted.dunn...@gmail.com> > > > Netflix is a small dataset. 12G for that seems quite excessive. > > > > Note also that this is before you have done any work. > > > > Ideally, 100million observations should take << 1GB. > > > > On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <pc...@uowmail.edu.au> > wrote: > > > > > The second idea is indeed splendid, we should separate time-complexity > > > first and space-complexity first implementation. What I'm not quite > sure, > > > is that if we really need to create two interfaces instead of one. > > > Personally, I think 12G heap space is not that high right? Most new > > laptop > > > can already handle that (emphasis on laptop). And if we replace hash > map > > > (the culprit of high memory consumption) with list/linkedList, it would > > > simply degrade time complexity for a linear search to O(n), not too bad > > > either. The current DataModel is a result of careful thoughts and has > > > underwent extensive test, it is easier to expand on top of it instead > of > > > subverting it. > > >