These issues should have been handled pretty transparently by the sparse data structures.
They they were not is a red flag that there should be a bit of thought applied here. On Tue, Sep 8, 2009 at 5:44 AM, Sean Owen <[email protected]> wrote: > I wanted the user base to take note of the change Gokhan suggests > below. I committed a variant of it just now which does indeed notably > speed up most algorithms by more intelligently selecting possibilities > to consider. On one test I am playing with it sped things up by 50% -- > in another, more like 400%. Depending on your data this could be a big > win. > > Sean > > On Mon, Sep 7, 2009 at 2:03 PM, Gökhan Çapan<[email protected]> wrote: > > Hi, Sean. > > I think we talked about mostSimilarItems( ) function before, about a bug > in > > ItemBasedRecommender. > > I think there is another issue, about performance. > > > > mostSimilarItems function gives the list of most similar items to a given > > item. > > In computation of those items, the algorithm looks at all other items in > > data model, but if there is no user that doesn't rate 2 items together it > is > > needless to look if there is a similarity between active item and that > item. > > > > > > > > That is the original function that returns most similar items list in > > cf.taste.impl.recommender.GenericItemBasedRecommender: > > > > private List<RecommendedItem> doMostSimilarItems(long itemID, > > int howMany, > > > TopItems.Estimator<Long> > > estimator) throws TasteException { > > DataModel model = getDataModel(); > > FastIDSet allItemIDs = new FastIDSet(model.getNumItems()); > > LongPrimitiveIterator it = model.getItemIDs(); > > > > > > while (it.hasNext()) { > > allItemIDs.add(it.nextLong()); > > } > > allItemIDs.remove(itemID); > > return TopItems.getTopItems(howMany, allItemIDs.iterator(), null, > > estimator); > > } > > > > > > > > > > I updated and use it that way: > > private List<RecommendedItem> doMostSimilarItems(long itemID, > > int howMany, > > > TopItems.Estimator<Long> > > estimator) throws TasteException { > > DataModel model = getDataModel(); > > > > FastIDSet set=new FastIDSet(); > > PreferenceArray arr=model.getPreferencesForItem(itemID); > > for(int i=0;i<arr.length();i++){ > > set.addAll(model.getItemIDsFromUser(arr.get(i).getUserID())); > > } > > set.remove(itemID); > > return TopItems.getTopItems(howMany,set.iterator(),null,estimator); > > } > > > > > > > > The only difference between two function is: > > the original one passes all items to getTopItems > > mine passes only the items that have at least one user who've rated both > > active item and that item. > > > > > > > > This little change made the algorithm pretty faster > > (For my data set it runs 4 times faster now.) > > > > I wanted to inform you, if you want to try and update the code. > > If for another reason original version of the code is better, please make > me > > know. > > > > > > > > > > > -- Ted Dunning, CTO DeepDyve
