Peng,

This is the reason I separated out the DataModel, and only put the learner
stuff there. The learner I mentioned yesterday just stores the
parameters, (noOfUsers+noOfItems)*noOfLatentFactors, and does not care
where preferences are stored.

I, kind of, agree with the multi-level DataModel approach:
One for iterating over "all" preferences, one for if one wants to deploy a
recommender and perform a lot of top-N recommendation tasks.

(Or one DataModel with a strategy that might reduce existing memory
consumption, while still providing fast access, I am not sure. Let me try a
matrix-backed DataModel approach)

Gokhan


On Tue, Jul 16, 2013 at 9:51 PM, Sebastian Schelter <s...@apache.org> wrote:

> I completely agree, Netflix is less than one gigabye in a smart
> representation, 12x more memory is a nogo. The techniques used in
> FactorizablePreferences allow a much more memory efficient representation,
> tested on KDD Music dataset which is approx 2.5 times Netflix and fits into
> 3GB with that approach.
>
>
> 2013/7/16 Ted Dunning <ted.dunn...@gmail.com>
>
> > Netflix is a small dataset.  12G for that seems quite excessive.
> >
> > Note also that this is before you have done any work.
> >
> > Ideally, 100million observations should take << 1GB.
> >
> > On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <pc...@uowmail.edu.au>
> wrote:
> >
> > > The second idea is indeed splendid, we should separate time-complexity
> > > first and space-complexity first implementation. What I'm not quite
> sure,
> > > is that if we really need to create two interfaces instead of one.
> > > Personally, I think 12G heap space is not that high right? Most new
> > laptop
> > > can already handle that (emphasis on laptop). And if we replace hash
> map
> > > (the culprit of high memory consumption) with list/linkedList, it would
> > > simply degrade time complexity for a linear search to O(n), not too bad
> > > either. The current DataModel is a result of careful thoughts and has
> > > underwent extensive test, it is easier to expand on top of it instead
> of
> > > subverting it.
> >
>

Reply via email to