Hi Sean,
>> 1. Findory's personalization used a type of hybrid collaborative filtering
>> algorithm that recommended articles based on a combination of
>> similarity of content and articles that tended to interested other
>> Findory users with similar tastes.
>Interesting -- yeah, that would be a hybrid of user-based and
item-based approaches.
When you say hybrid of user-based & item based approaches (as both forms
collaborative approach), how can we get articles with similar content.
From my understanding, I think Findory uses some kind of "Content Based
Filtering" + "Collaborative Based Filtering". Content based filtering may be
used to fetch documents with similar content. Best Example would be making use
of some sort of Lucene's "morelikeThis" or "Similar" queries. Correct me if i
am wrong.
Regards,
-Satish Dandu
-----Original Message-----
From: Sean Owen [mailto:[EMAIL PROTECTED]
Sent: Thursday, 28 August 2008 2:49 PM
To: [email protected]
Subject: Re: Tasty Findory
On Thu, Aug 28, 2008 at 9:20 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> 1. Findory's personalization used a type of hybrid collaborative filtering
> algorithm that recommended articles based on a combination of
> similarity of content and articles that tended to interested other
> Findory users with similar tastes.
Interesting -- yeah, that would be a hybrid of user-based and
item-based approaches.
Usually, in a user-based approach, you find similar users, and then
guess a rating for a new item by averaging the rating for that item of
similar users -- weighted by the user similarity of course.
Here, I imagine that in Findory you don't have a rating per se for
articles, just a boolean yes/no. So you substitute a similarity metric
between those items the user has read and a given new item.
Yeah... that does add up to an interesting new approach, likely. I'd
have to digest that a bit more to think about how to implement it
right.
> The way Findory does this is
> that it pre-computes as much of the expensive personalization as it
> can. Much of the task of matching interests to content is moved to an
> offline batch process. The online task of personalization, the part
> while the user is waiting, is reduced to a few thousand data lookups.
Ah-ha, yeah, computing offline is not surprising. Good news, because
that is the only option for the sorts of parallelization we are
considering via Hadoop.
There is a notion of "Rescorer" in the code which allows for injecting
arbitrary logic to re-rank recommendations. That maps to the "online
personalization" part, and indeed I think that is useful to allow for
some cheap, real-time logic to affect rankings, on top of
recommendations computed offline.