Hi, Please ask these questions on u...@mahout.apache.org. The dev@ mailing list is geared towards the development of the Mahout code, while the user list is geared towards questions on how to use Mahout.
Thanks, Grant On Oct 28, 2010, at 4:11 AM, Divya wrote: > Hi, > > I have directory of documents from which I have generated Sequence file > using SequenceFilesFromDirectory and then converted it into vectors > SparseVectorsFromSequenceFiles > > Now referring below link to generate a list of most similar documents > > > > http://mail-archives.apache.org/mod_mbox/mahout-user/201007.mbox/%3C4C2E3EED > .6070...@googlemail.com%3e > > > > How can I use RowSimilarityJob to generate list of similar documents . > > > > <ol> > > * <li>-Dmapred.input.dir=(path): Directory containing a {...@link > DistributedRowMatrix} as a > > * SequenceFile<IntWritable,VectorWritable></li> > > * <li>-Dmapred.output.dir=(path): output path where the computations output > should go (a {...@link DistributedRowMatrix} > > * stored as a SequenceFile<IntWritable,VectorWritable>)</li> > > * <li>--numberOfColumns: the number of columns in the input matrix</li> > > * <li>--similarityClassname (classname): an implementation of {...@link > DistributedVectorSimilarity} used to compute the > > * similarity</li> > > * <li>--maxSimilaritiesPerRow (integer): cap the number of similar rows per > row to this number (100)</li> > > * </ol> > > * > > > > Which argument should I pass numberOfColumns and similarityClassname ? > > > > > > Regards, > > Divya > -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search