On Fri, Apr 16, 2010 at 9:41 AM, Sebastian Feher <[email protected]> wrote:
> Given a data set with two users with their associated browsing and order
> history:
Yes exactly so.
> This lets me provide a simplistic recommendation:
> Browsed P2 -> Purchased : P2 2 times/50%, P3 1 time/25%, P4 1 time/25%
I'd say it differently, and for clarity, I think we should distinguish
browsed items ("users") from purchased items ("items"). Let's call
them B1..n and P1..n respectively.
What you have so far is that "user" B2 has an association of strength
2 to P2, 1 to P3 and 1 to P4. These are preference values in the
framework, like movie ratings.
You can make the 'recommendations' you describe above, though to
compute that you really don't need a recommender. It's just counting
things and returning top counts.
What a recommender could do here is to discover *new* Pn (maybe P7,
P8, P9...) that are likely to be purchased after browsing a given
item, but which have never been observed before. That may or may not
be of interest, but that's one way to apply recommendations here.
> Can you clarify "that can be done as post step where you remove similar items
> for a given item that were not purchased"? How would it work in this case?
This is a good point, but the opposite is actually the problem, in my
construction. The recommender doesn't tell you anything about known
Bn->Pn associations -- it figures you already know about those. But in
this case, perhaps it's really most interesting to stick to the
observed Bn->Pn associations.
To Ankur's comment -- he's suggesting a different construction. He's
talking about using your users as users, and items and items, and
determining item-item similarity from user-item associations, which
you might infer from browsing and purchasing. This is what I was
speaking about initially too.
Back in that world, you could just compute mostSimilarItems() to an
item being browsed to come up with a list of suggestions. This isn't
really recommendation either. But Ankur's just pointing out that you
have to filter that list, because it may contain items the user has
already bought.
Actually, the framework can distinguish browsed and purchased items,
without resorting to implementing Preference, which I don't recommend.
mostSImilarItems() takes a Rescorer object which can be used to reject
the item IDs of things the user has purchased. This is better than
filtering the list after it has been returned to you for a couple
reasons.