I see, OK so we shouldn't use the old implementation. But I mean, the old interface doesn't have to be discarded. The discrepancy between your FactorizablePreferences and DataModel is that, your model supports getPreferences(), which returns all preferences as an iterator, and DataModel supports a few old functions that returns preferences for an individual user or item.

My point is that, it is not hard for each of them to implement what they lack of: old DataModel can implement getPreferences() just by a a loop in abstract class. Your new FactorizablePreferences can implement those old functions by a binary search that takes O(log n) time, or an interpolation search that takes O(log log n) time in average. So does the online update. It will just be a matter of different speed and space, but not different interface standard, we can use old unit tests, old examples, old everything. And we will be more flexible in writing ensemble recommender.

Just a few thoughts, I'll have to validate the idea first before creating a new JIRA ticket.

Yours Peng


On 13-07-16 02:51 PM, Sebastian Schelter wrote:
I completely agree, Netflix is less than one gigabye in a smart
representation, 12x more memory is a nogo. The techniques used in
FactorizablePreferences allow a much more memory efficient representation,
tested on KDD Music dataset which is approx 2.5 times Netflix and fits into
3GB with that approach.


2013/7/16 Ted Dunning <ted.dunn...@gmail.com>

Netflix is a small dataset.  12G for that seems quite excessive.

Note also that this is before you have done any work.

Ideally, 100million observations should take << 1GB.

On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <pc...@uowmail.edu.au> wrote:

The second idea is indeed splendid, we should separate time-complexity
first and space-complexity first implementation. What I'm not quite sure,
is that if we really need to create two interfaces instead of one.
Personally, I think 12G heap space is not that high right? Most new
laptop
can already handle that (emphasis on laptop). And if we replace hash map
(the culprit of high memory consumption) with list/linkedList, it would
simply degrade time complexity for a linear search to O(n), not too bad
either. The current DataModel is a result of careful thoughts and has
underwent extensive test, it is easier to expand on top of it instead of
subverting it.


Reply via email to