+1 from me on this: anywhere we use JDBC / SQL, we could be using
a noSQL data store more scalably, because I don't *think* we rely on any
aggregate fancy SQL joining or grouping or anything.

One thing I wonder, Sean, is if you used say, Voldemort to store rows
of the ItemSimilarity matrix and the user-item preferences matrix,
computing recommendations for an item-based recommender on the
fly by doing a get() based on all the items a user had rated and then
a multiGet() based on all the keys returned, then recommending in
the usual fashion... it's two remote calls to a key-value store during
the course of producing recommendations, but something like Voldemort
or Cassandra would have their responses available in basically no time
(it's all in memory usually), and from experience, the latency you'd
get would be pretty reasonable.

Seems like it would be a nice integration to try out.  The advantage of
that kind of recommender (on the fly) is that you could change the way
you compute recommendations (ie. the model) on a per-query basis
if need be, and if new items were added to the users list of things
they'd rated, that users' row in the key-value store could be updated
on the fly too (the ItemSimilarity matrix would drift out of date, sure,
but it could be batch updated periodically).

What do you guys think?

  -jake


On Mon, May 31, 2010 at 10:08 AM, Ted Dunning <[email protected]> wrote:

> Yes.  This is clearly feasible.  Everywhere jdbc is used, noSQL could be
> used as well, perhaps to substantial advantage.
>
> On Mon, May 31, 2010 at 9:57 AM, Florent Empis <[email protected]
> >wrote:
>
> > org.apache.mahout.cf.taste.impl.recommender.slopeone.jdbc.*
> > and
> > org.apache.mahoutt.cf.taste.impl.similarity.jdbc.*
> >
> > The data structure used by these classes is very simple, hence I thought
> it
> > might make sense to store them in a process with less other overhead than
> a
> > full blown RDBMS.
> > The advantage of using a Key-pair distributed system for these seemed
> > obvious to me: several nodes providing resilience, and allowing for
> > scalibility on the querying side of things....
> >
>

Reply via email to