Re: generate similar documents

Grant Ingersoll Thu, 28 Oct 2010 04:15:53 -0700

Hi,

Please ask these questions on u...@mahout.apache.org.  The dev@ mailing list is 
geared towards the development of the Mahout code, while the user list is 
geared towards questions on how to use Mahout.


Thanks,
Grant


On Oct 28, 2010, at 4:11 AM, Divya wrote:

> Hi,
> 
> I have directory of documents from which I have generated Sequence file
> using SequenceFilesFromDirectory and then converted it into vectors
> SparseVectorsFromSequenceFiles
> 
> Now referring below link to  generate a list of most similar documents 
> 
> 
> 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201007.mbox/%3C4C2E3EED
> .6070...@googlemail.com%3e
> 
> 
> 
> How can I use RowSimilarityJob to generate list of similar documents  .
> 
> 
> 
> <ol>
> 
> * <li>-Dmapred.input.dir=(path): Directory containing a {...@link
> DistributedRowMatrix} as a
> 
> * SequenceFile<IntWritable,VectorWritable></li>
> 
> * <li>-Dmapred.output.dir=(path): output path where the computations output
> should go (a {...@link DistributedRowMatrix}
> 
> * stored as a SequenceFile<IntWritable,VectorWritable>)</li>
> 
> * <li>--numberOfColumns: the number of columns in the input matrix</li>
> 
> * <li>--similarityClassname (classname): an implementation of {...@link
> DistributedVectorSimilarity} used to compute the
> 
> * similarity</li>
> 
> * <li>--maxSimilaritiesPerRow (integer): cap the number of similar rows per
> row to this number (100)</li>
> 
> * </ol>
> 
> *
> 
> 
> 
> Which argument should I pass numberOfColumns and similarityClassname ?
> 
> 
> 
> 
> 
> Regards,
> 
> Divya 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Re: generate similar documents

Reply via email to