Sebastian,
The current recommender implementations does not make a
distinction between a 'browsed' item and a 'purchased' item when calculating
similarity.
So that can be done as post step where you remove similar items for a given
item that were not purchased.
The second option is to extend the 'Preference' interface for adding an API to
get the type information. You will then need to also provide appropriate
implementation (default is GenericPreference). You would then add a
doMostSimilarPurchasedItems() method to GenericItemBaseRecommender along with
few other changes. Obviously this is more work.
With FP mining algorithm the simplest thing is to just retain itemsets that
contain purchased items instead of modifying the algorithm itself. This may
result in interesting frequent itemsets where 2 different types of items were
browsed and purchased together.
-...@nkur
On 4/15/10 5:40 PM, "Sebastian Feher" <[email protected]> wrote:
Robin, Sebastian, Sean, thanks for your responses.
Yes that is exactly what I am looking for: computing frequent item sets based
on co-browse, co-purchase, co-searching, user-item ratings and other user-item
activities and then use these frequent item sets to provide recommendations for
an active item and/or an active user.
Regarding the GenericItemBasedRecommender.mostSimilarItems() I've used both
Tanimoto and also defined a custom similarity function that works the same way
to my current custom coded frequent item sets algorithm that I'm trying to
replace and test with Mahout.
There are a few questions that I'm not able to answer:
- do you support cross-type frequent item sets? for example - people who
Browsed this item - ended up purchasing these items. In this case the item
pairs are generated by taking one item from the Browse space and the other from
Purchase space. Is this something that can be achieved with the current
algorithms(GenericItemBasedRecommender.mostSimilarItems(), FP-Growth) in there
existing form and if not there an extension mechanism that allows me to do that
in a clean fashion or do I have to modify the algorithm code?
Thanks
On Apr 14, 2010, at 11:46 AM, Sebastian Schelter wrote:
> Hi Sebastian,
>
> I can only help you with what
> GenericItemBasedRecommender.mostSimilarItems() does. It's basically what
> you know from amazon.com: "People who like this item also like the
> following items". Mathematically spoken, you have a matrix of the
> preferences of users towards items and mostSimilarItems() searches the
> highest ranking item vectors using some similarity function (usually
> cosine or pearson correlation).
>
> A good overview about how item-based collaborative filtering works and
> what the most similar items are can be found in this paper (helped me
> understand the whole issue):
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.9927&rep=rep1&type=pdf
>
> Regards,
> Sebastian
>
> Sebastian Feher schrieb:
>> Hi All,
>>
>> I'm looking at extracting association rules with Mahout. If I understand it
>> correctly, both GenericItemBasedRecommender.mostSimilarItems() and Parallel
>> FP-Growth seem to provide support for doing that. Is this true? If not what
>> are the major differences between the two (including scalability,
>> performance)? Thanks.
>>
>> Sebastian
>