[ https://issues.apache.org/jira/browse/MAHOUT-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852831#action_12852831 ]
Sean Owen commented on MAHOUT-359: ---------------------------------- No preference is represented by the absence of a preference -- 'null', maybe. It's not represented by a preference of value 0, normally. But yes when put into a user vector, we have to give a value. Non-existence preferences are modelled as a 0. This makes preferences of 0 indistinguishable from no preference, unfortunately, in these Hadoop-based, vector-based implementations, but it doesn't usually cause an issue in practice. The loop in RecommenderMapper loops only over non-zero values, so 'value' is never 0 in the line you cite. In the case of boolean preferences, all values are 1. (I could optimize this and avoid the multiplication, I guess.) But that's not your issue is it? I also agree we can optimize findTopNPrefsCutoff(). For boolean data, the cutoff is 1.0, and all preferences are kept. We might want to keep a random n items. For now, it's not broken right, it's just keeping more data than we might desire. Is that resolving your issue? maybe you can otherwise help me understand the problem you are having. > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob for Boolean > recommendation > -------------------------------------------------------------------------------- > > Key: MAHOUT-359 > URL: https://issues.apache.org/jira/browse/MAHOUT-359 > Project: Mahout > Issue Type: Bug > Components: Collaborative Filtering > Affects Versions: 0.4 > Reporter: Hui Wen Han > > in some case there has no preference value in the input data ,the preference > value is set to zero,then > RecommenderMapper.class > @Override > public void map(LongWritable userID, > VectorWritable vectorWritable, > OutputCollector<LongWritable,RecommendedItemsWritable> > output, > Reporter reporter) throws IOException { > > if ((usersToRecommendFor != null) && > !usersToRecommendFor.contains(userID.get())) { > return; > } > Vector userVector = vectorWritable.get(); > Iterator<Vector.Element> userVectorIterator = userVector.iterateNonZero(); > Vector recommendationVector = new > RandomAccessSparseVector(Integer.MAX_VALUE, 1000); > while (userVectorIterator.hasNext()) { > Vector.Element element = userVectorIterator.next(); > int index = element.index(); > double value = element.get(); //here will get 0.0 for Boolean > recommendation > Vector columnVector; > try { > columnVector = cooccurrenceColumnCache.get(new IntWritable(index)); > } catch (TasteException te) { > if (te.getCause() instanceof IOException) { > throw (IOException) te.getCause(); > } else { > throw new IOException(te.getCause()); > } > } > if (columnVector != null) { > columnVector.times(value).addTo(recommendationVector); //here will > set all score value to zero for Boolean recommendation > } > } > > Queue<RecommendedItem> topItems = new > PriorityQueue<RecommendedItem>(recommendationsPerUser + 1, > Collections.reverseOrder()); > > Iterator<Vector.Element> recommendationVectorIterator = > recommendationVector.iterateNonZero(); > LongWritable itemID = new LongWritable(); > while (recommendationVectorIterator.hasNext()) { > Vector.Element element = recommendationVectorIterator.next(); > int index = element.index(); > if (userVector.get(index) == 0.0) { > if (topItems.size() < recommendationsPerUser) { > indexItemIDMap.get(new IntWritable(index), itemID); > topItems.add(new GenericRecommendedItem(itemID.get(), (float) > element.get())); > } else if (element.get() > topItems.peek().getValue()) { > indexItemIDMap.get(new IntWritable(index), itemID); > topItems.add(new GenericRecommendedItem(itemID.get(), (float) > element.get())); > topItems.poll(); > } > } > } > > List<RecommendedItem> recommendations = new > ArrayList<RecommendedItem>(topItems.size()); > recommendations.addAll(topItems); > Collections.sort(recommendations); > output.collect(userID, new RecommendedItemsWritable(recommendations)); > } > so maybe we need a option to distinguish boolean recommendation and slope one > recommendation. > in ToUserVectorReducer.class > here no need findTopNPrefsCutoff,maybe take all item. > it's just my thinking ,maybe item is used for slope one only . > :) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.