I completely agree, Netflix is less than one gigabye in a smart
representation, 12x more memory is a nogo. The techniques used in
FactorizablePreferences allow a much more memory efficient representation,
tested on KDD Music dataset which is approx 2.5 times Netflix and fits into
3GB with that approach.


2013/7/16 Ted Dunning <[email protected]>

> Netflix is a small dataset.  12G for that seems quite excessive.
>
> Note also that this is before you have done any work.
>
> Ideally, 100million observations should take << 1GB.
>
> On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <[email protected]> wrote:
>
> > The second idea is indeed splendid, we should separate time-complexity
> > first and space-complexity first implementation. What I'm not quite sure,
> > is that if we really need to create two interfaces instead of one.
> > Personally, I think 12G heap space is not that high right? Most new
> laptop
> > can already handle that (emphasis on laptop). And if we replace hash map
> > (the culprit of high memory consumption) with list/linkedList, it would
> > simply degrade time complexity for a linear search to O(n), not too bad
> > either. The current DataModel is a result of careful thoughts and has
> > underwent extensive test, it is easier to expand on top of it instead of
> > subverting it.
>

Reply via email to