Re: Regarding Online Recommenders

Peng Cheng Tue, 16 Jul 2013 08:20:20 -0700

Yeah, setPreference() and removePreference() shouldn't be there, butinjecting Recommender back to DataModel is kind of a strong dependency,which may intermingle components for different concerns. Maybe we can dosomething to RefreshHelper class? e.g. push something into a swap fieldso the downstream of a refreshable chain can read it out. I have readGokhan's UpdateAwareDataModel, and feel that it's probably tooheavyweight for a model selector as every time he change the algorithmhe has to re-register that.

The second idea is indeed splendid, we should separate time-complexityfirst and space-complexity first implementation. What I'm not quitesure, is that if we really need to create two interfaces instead of one.Personally, I think 12G heap space is not that high right? Most newlaptop can already handle that (emphasis on laptop). And if we replacehash map (the culprit of high memory consumption) with list/linkedList,it would simply degrade time complexity for a linear search to O(n), nottoo bad either. The current DataModel is a result of careful thoughtsand has underwent extensive test, it is easier to expand on top of itinstead of subverting it.


All the best,
Yours Peng

On 13-07-16 01:05 AM, Sebastian Schelter wrote:

Hi Gokhan,

I like your proposals and I think this is an important discussion. Peng
is also interested in working on online recommenders, so we should try
to team up our efforts. I'd like to extend the discussion a little to
related API changes, that I think are necessary.

What do you think about completely removing the setPreference() and
removePreference() methods from Recommender? I think they don't belong
there for two reasons: First,  they duplicate functionality from
DataModel and second, a lot of recommenders are read-only/train-once and
cannot handle single preference updates anyway.

I think we should have a DataModel implementation that can be updated
and an online learning recommender should be able to register to be
notified with updates.

We should further more split up the DataModel interface into a hierarchy
of three parts:

First, a simple readonly interface that allows sequential access to the
data (similar to FactorizablePreferences). This allows us to create
memory efficient implementations. E.g. Cheng reported in MAHOUT-1272
that the current DataModel needs 12GB heap for the Netflix dataset (100M
ratings) which is unacceptable. I was able to fit the KDD Music dataset
(250M ratings) into 3GB with FactorizablePreferences.

The second interface would extend the readonly interface and should
resemble what DataModel is today: An easy-to-use in-memory
implementation that trades high memory consumption for convenient random
access.

And finally the third interface would extend the second and provide
tooling for online updates of the data.

What do you think of that? Does it sound reasonable?

--sebastian

The DataModel I imagine would follow the current API, where underlying
preference storage is replaced with a matrix.

A Recommender would then use the DataModel and the OnlineLearner, where
Recommender#setPreference is delegated to DataModel#setPreference (like it
does now), and DataModel#setPreference triggers OnlineLearner#train.

Re: Regarding Online Recommenders

Reply via email to