Re: Recommend items not rated by any user

Tevfik Aytekin Wed, 05 Mar 2014 09:02:41 -0800

It can even make things worse in SVD-based algorithms for which
preference estimation is very fast.


On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin <tevfik.ayte...@gmail.com> wrote:
> Hi Sebastian,
> But in order not to select items that is not similar to at least one
> of the items the user interacted with you have to compute the
> similarity with all user items (which is the main task for estimating
> the preference of an item in item-based method). So, it seems to me
> that AllSimilarItemsStrategy does not bring much advantage over
> AllUnknownItemsCandidateItemsStrategy.
>
> On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter <s...@apache.org> wrote:
>>> So both strategies seems to be effectively the same, I don't know what
>>> the implementers had in mind when designing
>>> AllSimilarItemsCandidateItemsStrategy.
>>
>> It can take a long time to estimate preferences for all items a user doesn't
>> know. Especially if you have a lot of items. Traditional item-based
>> recommenders will not recommend any item that is not similar to at least one
>> of the items the user interacted with, so AllSimilarItemsStrategy already
>> selects the maximum set of items that could be potentially recommended to
>> the user.
>>
>> --sebastian
>>
>>
>>
>>
>> On 03/05/2014 05:38 PM, Tevfik Aytekin wrote:
>>>
>>> If the similarity between item 5 and two of the items user 1 preferred are
>>> not
>>> NaN then it will return 1, that is what I'm saying. If the
>>> similarities were all NaN then
>>> it will not return it.
>>>
>>> But surely, you might wonder if all similarities between an item and
>>> user's items are NaN, then
>>> AllUnknownItemsCandidateItemsStrategy probably will not return it.
>>>
>>
>>> On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos <jjar...@gmail.com> wrote:
>>>>
>>>> @Tevfik, running this recommender:
>>>>
>>>> GenericItemBasedRecommender itemRecommender = new
>>>> GenericItemBasedRecommender(dataModel, itemSimilarity, new
>>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new
>>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity));
>>>>
>>>>
>>>> With this dataModel:
>>>> 1,1,1.0
>>>> 1,2,2.0
>>>> 1,3,1.0
>>>> 1,4,2.0
>>>> 2,1,1.0
>>>> 2,2,4.0
>>>>
>>>>
>>>> And these similarities
>>>> 1,2,0.1
>>>> 1,3,0.2
>>>> 1,4,0.3
>>>> 2,3,0.5
>>>> 3,4,0.5
>>>> 5,1,0.2
>>>> 5,2,1.0
>>>>
>>>> Returns item 5 for User 1. So item 5 has not been preferred by user 1,
>>>> and
>>>> the similarity between item 5 and two of the items user 1 preferred are
>>>> not
>>>> NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item.
>>>> So,
>>>> I'm truly sorry to insist on this, but I still really do not get the
>>>> difference.
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin
>>>> <tevfik.ayte...@gmail.com>wrote:
>>>>
>>>>> Juan,
>>>>> You got me wrong,
>>>>>
>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>
>>>>> returns all items that have not been rated by the user and the
>>>>> similarity metric returns a non-NaN similarity value with at
>>>>> least one of the items preferred by the user.
>>>>>
>>>>> So, it does not simply return all items that have not been rated by
>>>>> the user. For example, if there is an item X which has not been rated
>>>>> by the user and if the similarity value between X and at least one of
>>>>> the items rated (preferred) by the user is not NaN, then X will be not
>>>>> be returned by AllSimilarItemsCandidateItemsStrategy, but it will be
>>>>> returned by AllUnknownItemsCandidateItemsStrategy.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jjar...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi Tefik,
>>>>>>
>>>>>> Thanks for the response. I think what you says contradicts what
>>>>>> Sebastian
>>>>>> pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy
>>>>>
>>>>> returns
>>>>>>
>>>>>> all items that have not been rated by the user, what would
>>>>>> AllUnknownItemsCandidateItemsStrategy return?
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin
>>>>>> <tevfik.ayte...@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry there was a typo in the previous paragraph.
>>>>>>>
>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>>
>>>>>>> returns all items that have not been rated by the user and the
>>>>>>> similarity metric returns a non-NaN similarity value with at
>>>>>>> least one of the items preferred by the user.
>>>>>>>
>>>>>>> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <
>>>>>
>>>>> tevfik.ayte...@gmail.com>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Juan,
>>>>>>>>
>>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>>>
>>>>>>>> returns all items that have not been rated by the user and the
>>>>>>>> similarity metric returns a non-NaN similarity value that is with at
>>>>>>>> least one of the items preferred by the user.
>>>>>>>>
>>>>>>>> Tevfik
>>>>>>>>
>>>>>>>> On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <s...@apache.org>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks for the reply, Sebastian.
>>>>>>>>>>
>>>>>>>>>> I am not sure if that should be implemented in the Abstract base
>>>>>
>>>>> class
>>>>>>>>>>
>>>>>>>>>> though because for
>>>>>>>>>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
>>>>>>>
>>>>>>> definition,
>>>>>>>>>>
>>>>>>>>>> it returns the item not rated by the user and rated by somebody
>>>>>
>>>>> else.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Good point. So we seem to need special implementations.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Back to my last post, I have been playing around with
>>>>>>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>>>>>> and AllUnknownItemsCandidateItemsStrategy, and although they both
>>>>>>>>>> do
>>>>>>>
>>>>>>> what
>>>>>>>>>>
>>>>>>>>>> I
>>>>>>>>>> wanted (recommend items not previously rated by any user), I
>>>>>
>>>>> honestly
>>>>>>>>>>
>>>>>>>>>> can't
>>>>>>>>>> tell the difference between the two strategies. In my tests the
>>>>>
>>>>> output
>>>>>>>
>>>>>>> was
>>>>>>>>>>
>>>>>>>>>> always the same. If the eventual output of the recommender will not
>>>>>>>>>> include
>>>>>>>>>> items already rated by the user as pointed out here (
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
>>>>>>>
>>>>>>> ),
>>>>>>>>>>
>>>>>>>>>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>>>>>>>>>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> AllSimilarItems returns all items that are similar to any item that
>>>>>
>>>>> the
>>>>>>>
>>>>>>> user
>>>>>>>>>
>>>>>>>>> already knows. AllUnknownItems simply returns all items that the
>>>>>>>>> user
>>>>>>>
>>>>>>> has
>>>>>>>>>
>>>>>>>>> not interacted with yet.
>>>>>>>>>
>>>>>>>>> These are two different things, although they might overlap in some
>>>>>>>>> scenarios.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Sebastian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <s...@apache.org
>>>>>>
>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Juan,
>>>>>>>>>>>
>>>>>>>>>>> that is a good catch. CandidateItemsStrategy is the right place to
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> implement this. Maybe we should simply extend its interface to add
>>>>>>>>>> a
>>>>>>>>>> parameter that says whether to keep or remove the current users
>>>>>
>>>>> items?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We could even do this in the abstract base class then.
>>>>>>>>>>>
>>>>>>>>>>> --sebastian
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In case somebody runs into the same situation, the key seems to
>>>>>
>>>>> be in
>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>> CandidateItemStrategy being passed to the constructor
>>>>>>>>>>>> of GenericItemBasedRecommender. Looking into the code, if no
>>>>>>>>>>>> CandidateItemStrategy is specified in the
>>>>>>>>>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is
>>>>>
>>>>> used
>>>>>>>>>>>>
>>>>>>>>>>>> and
>>>>>>>>>>>> as the documentation says, the doGetCandidateItems method:
>>>>>
>>>>> "returns
>>>>>>>
>>>>>>> all
>>>>>>>>>>>>
>>>>>>>>>>>> items that have not been rated by the user and that were
>>>>>
>>>>> preferred by
>>>>>>>>>>>>
>>>>>>>>>>>> another user that has preferred at least one item that the
>>>>>>>>>>>> current
>>>>>>>
>>>>>>> user
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> has
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> preferred too".
>>>>>>>>>>>>
>>>>>>>>>>>> So, a different CandidateItemStrategy needs to be passed. For
>>>>>>>>>>>> this
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> problem,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>>>>>>>>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
>>>>>>>
>>>>>>> anybody
>>>>>>>>>>>>
>>>>>>>>>>>> know where to find some documentation about the different
>>>>>>>>>>>> CandidateItemStrategy? Based on the name I would say that:
>>>>>>>>>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar
>>>>>>>>>>>> items
>>>>>>>>>>>> regardless of whether they have been already rated by someone or
>>>>>
>>>>> not.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar
>>>>>>>>>>>> items
>>>>>>>
>>>>>>> that
>>>>>>>>>>>>
>>>>>>>>>>>> have not been rated by anyone yet.
>>>>>>>>>>>>
>>>>>>>>>>>> Does anybody know if it works like that?
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <
>>>>>
>>>>> jjar...@gmail.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> First thing is thatI know this requirement would not make sense
>>>>>
>>>>> in
>>>>>>>
>>>>>>> a CF
>>>>>>>>>>>>>
>>>>>>>>>>>>> Recommender. In my case, I am trying to use Mahout to create
>>>>>>>
>>>>>>> something
>>>>>>>>>>>>>
>>>>>>>>>>>>> closer to a Content-Based Recommender.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In particular, I am pre-computing a similarity matrix between
>>>>>>>>>>>>> all
>>>>>>>
>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>> documents (items) of my catalogue and using that matrix as the
>>>>>>>>>>>>> ItemSimilarity for my Item-Based Recommender.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, when a user rates a document, how could I make the
>>>>>
>>>>> recommender
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> outputs
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> similar documents to that ones the user has already rated even
>>>>>
>>>>> if no
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> other
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> user in the system has rated them yet? Is that even possible in
>>>>>
>>>>> the
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> first
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> place?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks a lot.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>

Re: Recommend items not rated by any user

Reply via email to