These issues should have been handled pretty transparently by the sparse
data structures.

They they were not is a red flag that there should be a bit of thought
applied here.

On Tue, Sep 8, 2009 at 5:44 AM, Sean Owen <[email protected]> wrote:

> I wanted the user base to take note of the change Gokhan suggests
> below. I committed a variant of it just now which does indeed notably
> speed up most algorithms by more intelligently selecting possibilities
> to consider. On one test I am playing with it sped things up by 50% --
> in another, more like 400%. Depending on your data this could be a big
> win.
>
> Sean
>
> On Mon, Sep 7, 2009 at 2:03 PM, Gökhan Çapan<[email protected]> wrote:
> > Hi, Sean.
> > I think we talked about mostSimilarItems( ) function before, about a bug
> in
> > ItemBasedRecommender.
> > I think there is another issue, about performance.
> >
> > mostSimilarItems function gives the list of most similar items to a given
> > item.
> > In computation of those items, the algorithm looks at all other items in
> > data model, but if there is no user that doesn't rate 2 items together it
> is
> > needless to look if there is a similarity between active item and that
> item.
> >
> >
> >
> > That is the original function that returns most similar items list in
> > cf.taste.impl.recommender.GenericItemBasedRecommender:
> >
> >  private List<RecommendedItem> doMostSimilarItems(long itemID,
> >                                                    int howMany,
> >
> TopItems.Estimator<Long>
> > estimator) throws TasteException {
> >     DataModel model = getDataModel();
> >     FastIDSet allItemIDs = new FastIDSet(model.getNumItems());
> >     LongPrimitiveIterator it = model.getItemIDs();
> >
> >
> >     while (it.hasNext()) {
> >       allItemIDs.add(it.nextLong());
> >     }
> >     allItemIDs.remove(itemID);
> >     return TopItems.getTopItems(howMany, allItemIDs.iterator(), null,
> > estimator);
> >   }
> >
> >
> >
> >
> > I updated and use it that way:
> >  private List<RecommendedItem> doMostSimilarItems(long itemID,
> >                                                    int howMany,
> >
> TopItems.Estimator<Long>
> > estimator) throws TasteException {
> >     DataModel model = getDataModel();
> >
> >       FastIDSet set=new FastIDSet();
> >       PreferenceArray arr=model.getPreferencesForItem(itemID);
> >       for(int i=0;i<arr.length();i++){
> >           set.addAll(model.getItemIDsFromUser(arr.get(i).getUserID()));
> >       }
> >       set.remove(itemID);
> >       return TopItems.getTopItems(howMany,set.iterator(),null,estimator);
> >   }
> >
> >
> >
> > The only difference between two function is:
> > the original one passes all items to getTopItems
> > mine passes only the items that have at least one user who've rated both
> > active item and that item.
> >
> >
> >
> > This little change made the algorithm pretty faster
> > (For my data set it runs 4 times faster now.)
> >
> > I wanted to inform you, if you want to try and update the code.
> > If for another reason original version of the code is better, please make
> me
> > know.
> >
> >
> >
> >
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to