@Tevfik, running this recommender: GenericItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(dataModel, itemSimilarity, new AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new AllSimilarItemsCandidateItemsStrategy(itemSimilarity));
With this dataModel: 1,1,1.0 1,2,2.0 1,3,1.0 1,4,2.0 2,1,1.0 2,2,4.0 And these similarities 1,2,0.1 1,3,0.2 1,4,0.3 2,3,0.5 3,4,0.5 5,1,0.2 5,2,1.0 Returns item 5 for User 1. So item 5 has not been preferred by user 1, and the similarity between item 5 and two of the items user 1 preferred are not NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So, I'm truly sorry to insist on this, but I still really do not get the difference. On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin <tevfik.ayte...@gmail.com>wrote: > Juan, > You got me wrong, > > AllSimilarItemsCandidateItemsStrategy > > returns all items that have not been rated by the user and the > similarity metric returns a non-NaN similarity value with at > least one of the items preferred by the user. > > So, it does not simply return all items that have not been rated by > the user. For example, if there is an item X which has not been rated > by the user and if the similarity value between X and at least one of > the items rated (preferred) by the user is not NaN, then X will be not > be returned by AllSimilarItemsCandidateItemsStrategy, but it will be > returned by AllUnknownItemsCandidateItemsStrategy. > > > > On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jjar...@gmail.com> wrote: > > Hi Tefik, > > > > Thanks for the response. I think what you says contradicts what Sebastian > > pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy > returns > > all items that have not been rated by the user, what would > > AllUnknownItemsCandidateItemsStrategy return? > > > > > > On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin <tevfik.ayte...@gmail.com > >wrote: > > > >> Sorry there was a typo in the previous paragraph. > >> > >> If I remember correctly, AllSimilarItemsCandidateItemsStrategy > >> > >> returns all items that have not been rated by the user and the > >> similarity metric returns a non-NaN similarity value with at > >> least one of the items preferred by the user. > >> > >> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin < > tevfik.ayte...@gmail.com> > >> wrote: > >> > Hi Juan, > >> > > >> > If I remember correctly, AllSimilarItemsCandidateItemsStrategy > >> > > >> > returns all items that have not been rated by the user and the > >> > similarity metric returns a non-NaN similarity value that is with at > >> > least one of the items preferred by the user. > >> > > >> > Tevfik > >> > > >> > On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <s...@apache.org> > >> wrote: > >> >> On 03/05/2014 01:23 PM, Juan José Ramos wrote: > >> >>> > >> >>> Thanks for the reply, Sebastian. > >> >>> > >> >>> I am not sure if that should be implemented in the Abstract base > class > >> >>> though because for > >> >>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by > >> definition, > >> >>> it returns the item not rated by the user and rated by somebody > else. > >> >> > >> >> > >> >> Good point. So we seem to need special implementations. > >> >> > >> >> > >> >>> > >> >>> Back to my last post, I have been playing around with > >> >>> AllSimilarItemsCandidateItemsStrategy > >> >>> and AllUnknownItemsCandidateItemsStrategy, and although they both do > >> what > >> >>> I > >> >>> wanted (recommend items not previously rated by any user), I > honestly > >> >>> can't > >> >>> tell the difference between the two strategies. In my tests the > output > >> was > >> >>> always the same. If the eventual output of the recommender will not > >> >>> include > >> >>> items already rated by the user as pointed out here ( > >> >>> > >> >>> > >> > http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E > >> ), > >> >>> AllSimilarItemsCandidateItemsStrategy should be equivalent to > >> >>> AllUnkownItemsCandidateItemsStrategy, shouldn't it? > >> >> > >> >> > >> >> AllSimilarItems returns all items that are similar to any item that > the > >> user > >> >> already knows. AllUnknownItems simply returns all items that the user > >> has > >> >> not interacted with yet. > >> >> > >> >> These are two different things, although they might overlap in some > >> >> scenarios. > >> >> > >> >> Best, > >> >> Sebastian > >> >> > >> >> > >> >> > >> >>> > >> >>> Thanks. > >> >>> > >> >>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <s...@apache.org > > > >> >>> wrote: > >> >>>> > >> >>>> > >> >>>> Hi Juan, > >> >>>> > >> >>>> that is a good catch. CandidateItemsStrategy is the right place to > >> >>> > >> >>> implement this. Maybe we should simply extend its interface to add a > >> >>> parameter that says whether to keep or remove the current users > items? > >> >>>> > >> >>>> > >> >>>> We could even do this in the abstract base class then. > >> >>>> > >> >>>> --sebastian > >> >>>> > >> >>>> > >> >>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote: > >> >>>>> > >> >>>>> > >> >>>>> In case somebody runs into the same situation, the key seems to > be in > >> >>>>> the > >> >>>>> CandidateItemStrategy being passed to the constructor > >> >>>>> of GenericItemBasedRecommender. Looking into the code, if no > >> >>>>> CandidateItemStrategy is specified in the > >> >>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is > used > >> >>>>> and > >> >>>>> as the documentation says, the doGetCandidateItems method: > "returns > >> all > >> >>>>> items that have not been rated by the user and that were > preferred by > >> >>>>> another user that has preferred at least one item that the current > >> user > >> >>> > >> >>> has > >> >>>>> > >> >>>>> preferred too". > >> >>>>> > >> >>>>> So, a different CandidateItemStrategy needs to be passed. For this > >> >>> > >> >>> problem, > >> >>>>> > >> >>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy, > >> >>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does > >> anybody > >> >>>>> know where to find some documentation about the different > >> >>>>> CandidateItemStrategy? Based on the name I would say that: > >> >>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items > >> >>>>> regardless of whether they have been already rated by someone or > not. > >> >>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items > >> that > >> >>>>> have not been rated by anyone yet. > >> >>>>> > >> >>>>> Does anybody know if it works like that? > >> >>>>> Thanks. > >> >>>>> > >> >>>>> > >> >>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos < > jjar...@gmail.com> > >> >>> > >> >>> wrote: > >> >>>>> > >> >>>>> > >> >>>>>> First thing is thatI know this requirement would not make sense > in > >> a CF > >> >>>>>> Recommender. In my case, I am trying to use Mahout to create > >> something > >> >>>>>> closer to a Content-Based Recommender. > >> >>>>>> > >> >>>>>> In particular, I am pre-computing a similarity matrix between all > >> the > >> >>>>>> documents (items) of my catalogue and using that matrix as the > >> >>>>>> ItemSimilarity for my Item-Based Recommender. > >> >>>>>> > >> >>>>>> So, when a user rates a document, how could I make the > recommender > >> >>> > >> >>> outputs > >> >>>>>> > >> >>>>>> similar documents to that ones the user has already rated even > if no > >> >>> > >> >>> other > >> >>>>>> > >> >>>>>> user in the system has rated them yet? Is that even possible in > the > >> >>> > >> >>> first > >> >>>>>> > >> >>>>>> place? > >> >>>>>> > >> >>>>>> Thanks a lot. > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> >