It can even make things worse in SVD-based algorithms for which preference estimation is very fast.
On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin <tevfik.ayte...@gmail.com> wrote: > Hi Sebastian, > But in order not to select items that is not similar to at least one > of the items the user interacted with you have to compute the > similarity with all user items (which is the main task for estimating > the preference of an item in item-based method). So, it seems to me > that AllSimilarItemsStrategy does not bring much advantage over > AllUnknownItemsCandidateItemsStrategy. > > On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter <s...@apache.org> wrote: >>> So both strategies seems to be effectively the same, I don't know what >>> the implementers had in mind when designing >>> AllSimilarItemsCandidateItemsStrategy. >> >> It can take a long time to estimate preferences for all items a user doesn't >> know. Especially if you have a lot of items. Traditional item-based >> recommenders will not recommend any item that is not similar to at least one >> of the items the user interacted with, so AllSimilarItemsStrategy already >> selects the maximum set of items that could be potentially recommended to >> the user. >> >> --sebastian >> >> >> >> >> On 03/05/2014 05:38 PM, Tevfik Aytekin wrote: >>> >>> If the similarity between item 5 and two of the items user 1 preferred are >>> not >>> NaN then it will return 1, that is what I'm saying. If the >>> similarities were all NaN then >>> it will not return it. >>> >>> But surely, you might wonder if all similarities between an item and >>> user's items are NaN, then >>> AllUnknownItemsCandidateItemsStrategy probably will not return it. >>> >> >>> On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos <jjar...@gmail.com> wrote: >>>> >>>> @Tevfik, running this recommender: >>>> >>>> GenericItemBasedRecommender itemRecommender = new >>>> GenericItemBasedRecommender(dataModel, itemSimilarity, new >>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new >>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity)); >>>> >>>> >>>> With this dataModel: >>>> 1,1,1.0 >>>> 1,2,2.0 >>>> 1,3,1.0 >>>> 1,4,2.0 >>>> 2,1,1.0 >>>> 2,2,4.0 >>>> >>>> >>>> And these similarities >>>> 1,2,0.1 >>>> 1,3,0.2 >>>> 1,4,0.3 >>>> 2,3,0.5 >>>> 3,4,0.5 >>>> 5,1,0.2 >>>> 5,2,1.0 >>>> >>>> Returns item 5 for User 1. So item 5 has not been preferred by user 1, >>>> and >>>> the similarity between item 5 and two of the items user 1 preferred are >>>> not >>>> NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. >>>> So, >>>> I'm truly sorry to insist on this, but I still really do not get the >>>> difference. >>>> >>>> >>>> On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin >>>> <tevfik.ayte...@gmail.com>wrote: >>>> >>>>> Juan, >>>>> You got me wrong, >>>>> >>>>> AllSimilarItemsCandidateItemsStrategy >>>>> >>>>> returns all items that have not been rated by the user and the >>>>> similarity metric returns a non-NaN similarity value with at >>>>> least one of the items preferred by the user. >>>>> >>>>> So, it does not simply return all items that have not been rated by >>>>> the user. For example, if there is an item X which has not been rated >>>>> by the user and if the similarity value between X and at least one of >>>>> the items rated (preferred) by the user is not NaN, then X will be not >>>>> be returned by AllSimilarItemsCandidateItemsStrategy, but it will be >>>>> returned by AllUnknownItemsCandidateItemsStrategy. >>>>> >>>>> >>>>> >>>>> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jjar...@gmail.com> >>>>> wrote: >>>>>> >>>>>> Hi Tefik, >>>>>> >>>>>> Thanks for the response. I think what you says contradicts what >>>>>> Sebastian >>>>>> pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy >>>>> >>>>> returns >>>>>> >>>>>> all items that have not been rated by the user, what would >>>>>> AllUnknownItemsCandidateItemsStrategy return? >>>>>> >>>>>> >>>>>> On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin >>>>>> <tevfik.ayte...@gmail.com >>>>>> wrote: >>>>>> >>>>>>> Sorry there was a typo in the previous paragraph. >>>>>>> >>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy >>>>>>> >>>>>>> returns all items that have not been rated by the user and the >>>>>>> similarity metric returns a non-NaN similarity value with at >>>>>>> least one of the items preferred by the user. >>>>>>> >>>>>>> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin < >>>>> >>>>> tevfik.ayte...@gmail.com> >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Juan, >>>>>>>> >>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy >>>>>>>> >>>>>>>> returns all items that have not been rated by the user and the >>>>>>>> similarity metric returns a non-NaN similarity value that is with at >>>>>>>> least one of the items preferred by the user. >>>>>>>> >>>>>>>> Tevfik >>>>>>>> >>>>>>>> On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <s...@apache.org> >>>>>>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> On 03/05/2014 01:23 PM, Juan José Ramos wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for the reply, Sebastian. >>>>>>>>>> >>>>>>>>>> I am not sure if that should be implemented in the Abstract base >>>>> >>>>> class >>>>>>>>>> >>>>>>>>>> though because for >>>>>>>>>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by >>>>>>> >>>>>>> definition, >>>>>>>>>> >>>>>>>>>> it returns the item not rated by the user and rated by somebody >>>>> >>>>> else. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Good point. So we seem to need special implementations. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Back to my last post, I have been playing around with >>>>>>>>>> AllSimilarItemsCandidateItemsStrategy >>>>>>>>>> and AllUnknownItemsCandidateItemsStrategy, and although they both >>>>>>>>>> do >>>>>>> >>>>>>> what >>>>>>>>>> >>>>>>>>>> I >>>>>>>>>> wanted (recommend items not previously rated by any user), I >>>>> >>>>> honestly >>>>>>>>>> >>>>>>>>>> can't >>>>>>>>>> tell the difference between the two strategies. In my tests the >>>>> >>>>> output >>>>>>> >>>>>>> was >>>>>>>>>> >>>>>>>>>> always the same. If the eventual output of the recommender will not >>>>>>>>>> include >>>>>>>>>> items already rated by the user as pointed out here ( >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E >>>>>>> >>>>>>> ), >>>>>>>>>> >>>>>>>>>> AllSimilarItemsCandidateItemsStrategy should be equivalent to >>>>>>>>>> AllUnkownItemsCandidateItemsStrategy, shouldn't it? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> AllSimilarItems returns all items that are similar to any item that >>>>> >>>>> the >>>>>>> >>>>>>> user >>>>>>>>> >>>>>>>>> already knows. AllUnknownItems simply returns all items that the >>>>>>>>> user >>>>>>> >>>>>>> has >>>>>>>>> >>>>>>>>> not interacted with yet. >>>>>>>>> >>>>>>>>> These are two different things, although they might overlap in some >>>>>>>>> scenarios. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Sebastian >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <s...@apache.org >>>>>> >>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Juan, >>>>>>>>>>> >>>>>>>>>>> that is a good catch. CandidateItemsStrategy is the right place to >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> implement this. Maybe we should simply extend its interface to add >>>>>>>>>> a >>>>>>>>>> parameter that says whether to keep or remove the current users >>>>> >>>>> items? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We could even do this in the abstract base class then. >>>>>>>>>>> >>>>>>>>>>> --sebastian >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> In case somebody runs into the same situation, the key seems to >>>>> >>>>> be in >>>>>>>>>>>> >>>>>>>>>>>> the >>>>>>>>>>>> CandidateItemStrategy being passed to the constructor >>>>>>>>>>>> of GenericItemBasedRecommender. Looking into the code, if no >>>>>>>>>>>> CandidateItemStrategy is specified in the >>>>>>>>>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is >>>>> >>>>> used >>>>>>>>>>>> >>>>>>>>>>>> and >>>>>>>>>>>> as the documentation says, the doGetCandidateItems method: >>>>> >>>>> "returns >>>>>>> >>>>>>> all >>>>>>>>>>>> >>>>>>>>>>>> items that have not been rated by the user and that were >>>>> >>>>> preferred by >>>>>>>>>>>> >>>>>>>>>>>> another user that has preferred at least one item that the >>>>>>>>>>>> current >>>>>>> >>>>>>> user >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> has >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> preferred too". >>>>>>>>>>>> >>>>>>>>>>>> So, a different CandidateItemStrategy needs to be passed. For >>>>>>>>>>>> this >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> problem, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy, >>>>>>>>>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does >>>>>>> >>>>>>> anybody >>>>>>>>>>>> >>>>>>>>>>>> know where to find some documentation about the different >>>>>>>>>>>> CandidateItemStrategy? Based on the name I would say that: >>>>>>>>>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar >>>>>>>>>>>> items >>>>>>>>>>>> regardless of whether they have been already rated by someone or >>>>> >>>>> not. >>>>>>>>>>>> >>>>>>>>>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar >>>>>>>>>>>> items >>>>>>> >>>>>>> that >>>>>>>>>>>> >>>>>>>>>>>> have not been rated by anyone yet. >>>>>>>>>>>> >>>>>>>>>>>> Does anybody know if it works like that? >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos < >>>>> >>>>> jjar...@gmail.com> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> First thing is thatI know this requirement would not make sense >>>>> >>>>> in >>>>>>> >>>>>>> a CF >>>>>>>>>>>>> >>>>>>>>>>>>> Recommender. In my case, I am trying to use Mahout to create >>>>>>> >>>>>>> something >>>>>>>>>>>>> >>>>>>>>>>>>> closer to a Content-Based Recommender. >>>>>>>>>>>>> >>>>>>>>>>>>> In particular, I am pre-computing a similarity matrix between >>>>>>>>>>>>> all >>>>>>> >>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>>> documents (items) of my catalogue and using that matrix as the >>>>>>>>>>>>> ItemSimilarity for my Item-Based Recommender. >>>>>>>>>>>>> >>>>>>>>>>>>> So, when a user rates a document, how could I make the >>>>> >>>>> recommender >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> outputs >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> similar documents to that ones the user has already rated even >>>>> >>>>> if no >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> other >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> user in the system has rated them yet? Is that even possible in >>>>> >>>>> the >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> first >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> place? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks a lot. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>