Not sure it has a lot to do with data structures? It's not, well,
obvious that one doesn't need to consider every item for
recommendation -- that's the canonical way the algorithms work. In
practice, there's a shortcut.

Technically, it's kind of wrong. There's nothing about recommender
engines per se that dictate you couldn't recommend items that aren't
connected by some co-occurrence. But in practice, it's pretty fine to
take the shortcut.

This is probably something my brain should have gotten to earlier,
but, not really sure it is suggestive of another problem? unless I
miss something.

On Tue, Sep 8, 2009 at 9:15 PM, Ted Dunning<[email protected]> wrote:
> These issues should have been handled pretty transparently by the sparse
> data structures.
>
> They they were not is a red flag that there should be a bit of thought
> applied here.
>
> On Tue, Sep 8, 2009 at 5:44 AM, Sean Owen <[email protected]> wrote:
>
>> I wanted the user base to take note of the change Gokhan suggests
>> below. I committed a variant of it just now which does indeed notably
>> speed up most algorithms by more intelligently selecting possibilities
>> to consider. On one test I am playing with it sped things up by 50% --
>> in another, more like 400%. Depending on your data this could be a big
>> win.
>>
>> Sean
>>
>> On Mon, Sep 7, 2009 at 2:03 PM, Gökhan Çapan<[email protected]> wrote:
>> > Hi, Sean.
>> > I think we talked about mostSimilarItems( ) function before, about a bug
>> in
>> > ItemBasedRecommender.
>> > I think there is another issue, about performance.
>> >
>> > mostSimilarItems function gives the list of most similar items to a given
>> > item.
>> > In computation of those items, the algorithm looks at all other items in
>> > data model, but if there is no user that doesn't rate 2 items together it
>> is
>> > needless to look if there is a similarity between active item and that
>> item.
>> >
>> >
>> >
>> > That is the original function that returns most similar items list in
>> > cf.taste.impl.recommender.GenericItemBasedRecommender:
>> >
>> >  private List<RecommendedItem> doMostSimilarItems(long itemID,
>> >                                                    int howMany,
>> >
>> TopItems.Estimator<Long>
>> > estimator) throws TasteException {
>> >     DataModel model = getDataModel();
>> >     FastIDSet allItemIDs = new FastIDSet(model.getNumItems());
>> >     LongPrimitiveIterator it = model.getItemIDs();
>> >
>> >
>> >     while (it.hasNext()) {
>> >       allItemIDs.add(it.nextLong());
>> >     }
>> >     allItemIDs.remove(itemID);
>> >     return TopItems.getTopItems(howMany, allItemIDs.iterator(), null,
>> > estimator);
>> >   }
>> >
>> >
>> >
>> >
>> > I updated and use it that way:
>> >  private List<RecommendedItem> doMostSimilarItems(long itemID,
>> >                                                    int howMany,
>> >
>> TopItems.Estimator<Long>
>> > estimator) throws TasteException {
>> >     DataModel model = getDataModel();
>> >
>> >       FastIDSet set=new FastIDSet();
>> >       PreferenceArray arr=model.getPreferencesForItem(itemID);
>> >       for(int i=0;i<arr.length();i++){
>> >           set.addAll(model.getItemIDsFromUser(arr.get(i).getUserID()));
>> >       }
>> >       set.remove(itemID);
>> >       return TopItems.getTopItems(howMany,set.iterator(),null,estimator);
>> >   }
>> >
>> >
>> >
>> > The only difference between two function is:
>> > the original one passes all items to getTopItems
>> > mine passes only the items that have at least one user who've rated both
>> > active item and that item.
>> >
>> >
>> >
>> > This little change made the algorithm pretty faster
>> > (For my data set it runs 4 times faster now.)
>> >
>> > I wanted to inform you, if you want to try and update the code.
>> > If for another reason original version of the code is better, please make
>> me
>> > know.
>> >
>> >
>> >
>> >
>> >
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Reply via email to