On Wed, Apr 22, 2009 at 3:29 PM, Mirko Gontek <[email protected]> wrote:
> Hi Sean,
> when you say that the FileDataModel originally was intended to be read-only
> I get the impression that I am on the wrong track. Maybe you could comment
> on my thoughts, this would be great help..

FileDataModel was read-only in the sense that setPreference() and
removePreference() did not work. FileDataModel would only change if
the file it reads changed. But now that is not so -- you can call
these methods to temporarily change the data in memory. This may make
sense if you want to update your file *and* quickly update the
in-memory representation without re-reading the file. (I still, maybe,
wouldn't architect it this way and would just reload everything
infrequently.)

.
> I would like to implement a GenericItemBasedRecommender, my testdata is a DB
> with 300.000 Preferences (130.000 items, 12.000 users).

Typically if you have many more items than users, you would prefer a
user-based recommender, for performance. This is because a user-based
recommender compares the user to all other users, and an item compares
an item to all other items.

But, an item-based recommender could be fast and appropriate if you
have a very efficient source of item-item similarities -- see below.

>
> 1) I implement a DataModel that initially loads all data from the DB into
> memory and works with the data in memory from that point on. My DataModel
> implementation only accesses (read/write) the DB on refresh().

That's OK. In general, the idea was that DataModels do not cache any
information. They are always the current, authoritative source of
information. (FileDataModel is kind of exceptional since there is no
other efficient way to operate but load and store data in memory.) So
this is why the JDBC data models do not store in memory. Other
components cache and store things in memory.

It is fine, however, to proceed the way you propose. For performance,
storing in memory is far faster, if you have enough memory.


> 2) For the recommender to be fast, I need pre-computed ItemItemSimilarities.
> Thus, I implement ItemSimilarity. My implementation keeps all
> ItemItemSimilarities in memory, until refresh(). Like above, my
> ItemSimilarity implementation only accesses (read/write) the DB on
> refresh().

Yep, that is appropriate.


> 3) Since I don't have a good method to calculate item similarities yet, I
> want to use the following to generate itemSimilarities once:
> MyItemSimilarityImpl itemSimilarity = new GenericItemSimilarity(new
> PearsonCorrelationSimilarity(dataModel), dataModel, maxToKeep);

That's OK. One of the main strengths of item-based recommenders is
that you can meaningfully inject an external, additional notion of
item similarity, to add more information that way. Here you are not
adding more info than is already in the model. But, it certainly
works.  Later you might use a different measure.


> My question is: is it good practice to keep all data in memory until
> refresh? I mean, memory is of course limited, so memory-based DataModel (or
> ItemSimilarity) implementations are limited, right? (For this reason I
> looked to FileDataModel).

Yes I would recommend you use memory as much as possible. At some
point you will not be able to, of course. Then I think you would
resort to a JDBC-based data model which does *not* read into memory.
You might store pre-computed item-item similarities in a DB rather
than memory. This will slow things down of course, but becomes
necessary.

Reply via email to