Mmm, You are right, the most simple solution is usually the best, I'm creating new jira ticket.

Yours Peng

On 13-07-17 04:58 PM, Sebastian Schelter wrote:
Hi Peng,

I never wanted to discard the old interface, I just wanted to split it up.
I want to have a simple interface that only supports sequential access (and
allows for very memory efficient implementions, e.g. by the use of
primitive arrays). DataModel should *extend* this interface and provide
sequential and random access (basically what is already does).

Than a recommender such as SGD could state that it only needs sequential
access to the preferences and you can either feed it a DataModel (so we
don"t break backwards compatibility) or a memory efficient sequential
access thingy.

Does that make sense for you?


2013/7/17 Peng Cheng <pc...@uowmail.edu.au>

I see, OK so we shouldn't use the old implementation. But I mean, the old
interface doesn't have to be discarded. The discrepancy between your
FactorizablePreferences and DataModel is that, your model supports
getPreferences(), which returns all preferences as an iterator, and
DataModel supports a few old functions that returns preferences for an
individual user or item.

My point is that, it is not hard for each of them to implement what they
lack of: old DataModel can implement getPreferences() just by a a loop in
abstract class. Your new FactorizablePreferences can implement those old
functions by a binary search that takes O(log n) time, or an interpolation
search that takes O(log log n) time in average. So does the online update.
It will just be a matter of different speed and space, but not different
interface standard, we can use old unit tests, old examples, old
everything. And we will be more flexible in writing ensemble recommender.

Just a few thoughts, I'll have to validate the idea first before creating
a new JIRA ticket.

Yours Peng



On 13-07-16 02:51 PM, Sebastian Schelter wrote:

I completely agree, Netflix is less than one gigabye in a smart
representation, 12x more memory is a nogo. The techniques used in
FactorizablePreferences allow a much more memory efficient representation,
tested on KDD Music dataset which is approx 2.5 times Netflix and fits
into
3GB with that approach.


2013/7/16 Ted Dunning <ted.dunn...@gmail.com>

  Netflix is a small dataset.  12G for that seems quite excessive.
Note also that this is before you have done any work.

Ideally, 100million observations should take << 1GB.

On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <pc...@uowmail.edu.au>
wrote:

  The second idea is indeed splendid, we should separate time-complexity
first and space-complexity first implementation. What I'm not quite
sure,
is that if we really need to create two interfaces instead of one.
Personally, I think 12G heap space is not that high right? Most new

laptop

can already handle that (emphasis on laptop). And if we replace hash map
(the culprit of high memory consumption) with list/linkedList, it would
simply degrade time complexity for a linear search to O(n), not too bad
either. The current DataModel is a result of careful thoughts and has
underwent extensive test, it is easier to expand on top of it instead of
subverting it.




Reply via email to