[ 
https://issues.apache.org/jira/browse/MAHOUT-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852831#action_12852831
 ] 

Sean Owen commented on MAHOUT-359:
----------------------------------

No preference is represented by the absence of a preference -- 'null', maybe. 
It's not represented by a preference of value 0, normally.

But yes when put into a user vector, we have to give a value. Non-existence 
preferences are modelled as a 0. This makes preferences of 0 indistinguishable 
from no preference, unfortunately, in these Hadoop-based, vector-based 
implementations, but it doesn't usually cause an issue in practice.

The loop in RecommenderMapper loops only over non-zero values, so 'value' is 
never 0 in the line you cite.

In the case of boolean preferences, all values are 1. (I could optimize this 
and avoid the multiplication, I guess.) But that's not your issue is it?


I also agree we can optimize findTopNPrefsCutoff(). For boolean data, the 
cutoff is 1.0, and all preferences are kept. We might want to keep a random n 
items. For now, it's not broken right, it's just keeping more data than we 
might desire.


Is that resolving your issue? maybe you can otherwise help me understand the 
problem you are having.

> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob for Boolean 
> recommendation
> --------------------------------------------------------------------------------
>
>                 Key: MAHOUT-359
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-359
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Hui Wen Han
>
> in some case there has no preference value in the input data ,the preference 
> value is set to zero,then 
> RecommenderMapper.class
>  @Override
>   public void map(LongWritable userID,
>                   VectorWritable vectorWritable,
>                   OutputCollector<LongWritable,RecommendedItemsWritable> 
> output,
>                   Reporter reporter) throws IOException {
>     
>     if ((usersToRecommendFor != null) && 
> !usersToRecommendFor.contains(userID.get())) {
>       return;
>     }
>     Vector userVector = vectorWritable.get();
>     Iterator<Vector.Element> userVectorIterator = userVector.iterateNonZero();
>     Vector recommendationVector = new 
> RandomAccessSparseVector(Integer.MAX_VALUE, 1000);
>     while (userVectorIterator.hasNext()) {
>       Vector.Element element = userVectorIterator.next();
>       int index = element.index();
>       double value = element.get();     //here will get 0.0 for Boolean 
> recommendation 
>       Vector columnVector;
>       try {
>         columnVector = cooccurrenceColumnCache.get(new IntWritable(index));
>       } catch (TasteException te) {
>         if (te.getCause() instanceof IOException) {
>           throw (IOException) te.getCause();
>         } else {
>           throw new IOException(te.getCause());
>         }
>       }
>       if (columnVector != null) {
>         columnVector.times(value).addTo(recommendationVector); //here will 
> set all score value to zero for Boolean recommendation
>       }
>     }
>     
>     Queue<RecommendedItem> topItems = new 
> PriorityQueue<RecommendedItem>(recommendationsPerUser + 1,
>         Collections.reverseOrder());
>     
>     Iterator<Vector.Element> recommendationVectorIterator = 
> recommendationVector.iterateNonZero();
>     LongWritable itemID = new LongWritable();
>     while (recommendationVectorIterator.hasNext()) {
>       Vector.Element element = recommendationVectorIterator.next();
>       int index = element.index();
>       if (userVector.get(index) == 0.0) {
>         if (topItems.size() < recommendationsPerUser) {
>           indexItemIDMap.get(new IntWritable(index), itemID);
>           topItems.add(new GenericRecommendedItem(itemID.get(), (float) 
> element.get()));
>         } else if (element.get() > topItems.peek().getValue()) {
>           indexItemIDMap.get(new IntWritable(index), itemID);
>           topItems.add(new GenericRecommendedItem(itemID.get(), (float) 
> element.get()));
>           topItems.poll();
>         }
>       }
>     }
>     
>     List<RecommendedItem> recommendations = new 
> ArrayList<RecommendedItem>(topItems.size());
>     recommendations.addAll(topItems);
>     Collections.sort(recommendations);
>     output.collect(userID, new RecommendedItemsWritable(recommendations));
>   }
> so maybe we need a option to distinguish boolean recommendation and slope one 
> recommendation.
> in ToUserVectorReducer.class
> here no need findTopNPrefsCutoff,maybe take all item.
> it's just my thinking ,maybe item is used for slope one only .
> :)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to