Hi, I have directory of documents from which I have generated Sequence file using SequenceFilesFromDirectory and then converted it into vectors SparseVectorsFromSequenceFiles
Now referring below link to generate a list of most similar documents http://mail-archives.apache.org/mod_mbox/mahout-user/201007.mbox/%3C4C2E3EED .6070...@googlemail.com%3e How can I use RowSimilarityJob to generate list of similar documents . <ol> * <li>-Dmapred.input.dir=(path): Directory containing a {...@link DistributedRowMatrix} as a * SequenceFile<IntWritable,VectorWritable></li> * <li>-Dmapred.output.dir=(path): output path where the computations output should go (a {...@link DistributedRowMatrix} * stored as a SequenceFile<IntWritable,VectorWritable>)</li> * <li>--numberOfColumns: the number of columns in the input matrix</li> * <li>--similarityClassname (classname): an implementation of {...@link DistributedVectorSimilarity} used to compute the * similarity</li> * <li>--maxSimilaritiesPerRow (integer): cap the number of similar rows per row to this number (100)</li> * </ol> * Which argument should I pass numberOfColumns and similarityClassname ? Regards, Divya