Hi,

RowSimilarityJob by itself does not do down-sampling.

The down-sampling is done by the ToItemVectorsMapper in the
PreparePreferenceMatrixJob which is responsible for preparing the inputs
(the matrix of interactions between users and items) for
ItemSimilarityJob and RecommenderJob. As Sean noted, the option
"maxPrefsPerUser" controls the sampling. By default, we use a 1000
samples per user.

We could also move the sampling directly to RowSimilarityJob if people
consider this more useful.

Best,
Sebastian


On 18.06.2013 22:50, Ted Dunning wrote:
> But RecommenderJob seems to call RowSimilarityJob first.  That is where
> sampling needs to be done.
> 
>       //calculate the co-occurrence matrix
>       ToolRunner.run(getConf(), new RowSimilarityJob(), new String[]{
>         "--input", new Path(prepPath,
> PreparePreferenceMatrixJob.RATING_MATRIX).toString(),
>         "--output", similarityMatrixPath.toString(),
>         "--numberOfColumns", String.valueOf(numberOfUsers),
>         "--similarityClassname", similarityClassname,
>         "--maxSimilaritiesPerRow", String.valueOf(maxSimilaritiesPerItem),
>         "--excludeSelfSimilarity", String.valueOf(Boolean.TRUE),
>         "--threshold", String.valueOf(threshold),Hi
>         "--tempDir", getTempPath().toString(),
>       });
> 
>       // write out the similarity matrix if the user specified that behavior
>       if (hasOption("outputPathForSimilarityMatrix")) {
>         Path outputPathForSimilarityMatrix = new
> Path(getOption("outputPathForSimilarityMatrix"));
> 
>         Job outputSimilarityMatrix = prepareJob(similarityMatrixPath,
> outputPathForSimilarityMatrix,
>             SequenceFileInputFormat.class,
> ItemSimilarityJob.MostSimilarItemPairsMapper.class,
>             EntityEntityWritable.class, DoubleWritable.class,
> ItemSimilarityJob.MostSimilarItemPairsReducer.class,
>             EntityEntityWritable.class, DoubleWritable.class,
> TextOutputFormat.class);
> 
>         Configuration mostSimilarItemsConf =
> outputSimilarityMatrix.getConfiguration();
>         mostSimilarItemsConf.set(ItemSimilarityJob.ITEM_ID_INDEX_PATH_STR,
>             new Path(prepPath,
> PreparePreferenceMatrixJob.ITEMID_INDEX).toString());
> 
> mostSimilarItemsConf.setInt(ItemSimilarityJob.MAX_SIMILARITIES_PER_ITEM,
> maxSimilaritiesPerItem);
>         outputSimilarityMatrix.waitForCompletion(true);
>       }
>     }
> 
> 
> 
> 
> On Tue, Jun 18, 2013 at 10:47 PM, Sean Owen <sro...@gmail.com> wrote:
> 
>> No, it's in ItemSimilarityJob -- I'm looking at it now. It ends up
>> setting ToItemVectorsMapper.SAMPLE_SIZE, if that helps.
>>
>> On Tue, Jun 18, 2013 at 9:43 PM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>> Ahh... only effective in RecommenderJob.
>>
> 

Reply via email to