I completely agree, Netflix is less than one gigabye in a smart representation, 12x more memory is a nogo. The techniques used in FactorizablePreferences allow a much more memory efficient representation, tested on KDD Music dataset which is approx 2.5 times Netflix and fits into 3GB with that approach.
2013/7/16 Ted Dunning <[email protected]> > Netflix is a small dataset. 12G for that seems quite excessive. > > Note also that this is before you have done any work. > > Ideally, 100million observations should take << 1GB. > > On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <[email protected]> wrote: > > > The second idea is indeed splendid, we should separate time-complexity > > first and space-complexity first implementation. What I'm not quite sure, > > is that if we really need to create two interfaces instead of one. > > Personally, I think 12G heap space is not that high right? Most new > laptop > > can already handle that (emphasis on laptop). And if we replace hash map > > (the culprit of high memory consumption) with list/linkedList, it would > > simply degrade time complexity for a linear search to O(n), not too bad > > either. The current DataModel is a result of careful thoughts and has > > underwent extensive test, it is easier to expand on top of it instead of > > subverting it. >
