Re: Regarding Online Recommenders

Peng Cheng Wed, 17 Jul 2013 13:52:18 -0700

I see, OK so we shouldn't use the old implementation. But I mean, theold interface doesn't have to be discarded. The discrepancy between yourFactorizablePreferences and DataModel is that, your model supportsgetPreferences(), which returns all preferences as an iterator, andDataModel supports a few old functions that returns preferences for anindividual user or item.

My point is that, it is not hard for each of them to implement what theylack of: old DataModel can implement getPreferences() just by a a loopin abstract class. Your new FactorizablePreferences can implement thoseold functions by a binary search that takes O(log n) time, or aninterpolation search that takes O(log log n) time in average. So doesthe online update. It will just be a matter of different speed andspace, but not different interface standard, we can use old unit tests,old examples, old everything. And we will be more flexible in writingensemble recommender.

Just a few thoughts, I'll have to validate the idea first beforecreating a new JIRA ticket.


Yours Peng


On 13-07-16 02:51 PM, Sebastian Schelter wrote:

I completely agree, Netflix is less than one gigabye in a smart
representation, 12x more memory is a nogo. The techniques used in
FactorizablePreferences allow a much more memory efficient representation,
tested on KDD Music dataset which is approx 2.5 times Netflix and fits into
3GB with that approach.


2013/7/16 Ted Dunning <ted.dunn...@gmail.com>

Netflix is a small dataset.  12G for that seems quite excessive.

Note also that this is before you have done any work.

Ideally, 100million observations should take << 1GB.

On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <pc...@uowmail.edu.au> wrote:

The second idea is indeed splendid, we should separate time-complexity
first and space-complexity first implementation. What I'm not quite sure,
is that if we really need to create two interfaces instead of one.
Personally, I think 12G heap space is not that high right? Most new

laptop

can already handle that (emphasis on laptop). And if we replace hash map
(the culprit of high memory consumption) with list/linkedList, it would
simply degrade time complexity for a linear search to O(n), not too bad
either. The current DataModel is a result of careful thoughts and has
underwent extensive test, it is easier to expand on top of it instead of
subverting it.

Re: Regarding Online Recommenders

Reply via email to