Hi Sebastian,
>From where can I get the numberOfColumns.
How can I calculate I have these many columns my matrix has as
SparseVectorsFromSequenceFiles generates vectors in binary format.

Regards,
Divya 

-----Original Message-----
From: Sebastian Schelter [mailto:[email protected]] 
Sent: Thursday, October 28, 2010 4:28 PM
To: [email protected]
Subject: Re: generate similar documents

Hi Divya,

--similarityClassname should point to an implementation of 
org.apache.mahout.math.hadoop.similarity.vector.DistributedVectorSimilarity,

you can use any value from 
org.apache.mahout.math.hadoop.similarity.SimilarityType to use a 
predefined similarity measure or you can point to an implementation of 
your own

--numberOfColumns is the number of columns of the input matrix, which 
would be the number of unique terms as I suppose your matrix is 
documents x terms

--sebastian

On 28.10.2010 10:11, Divya wrote:
> Hi,
>
> I have directory of documents from which I have generated Sequence file
> using SequenceFilesFromDirectory and then converted it into vectors
> SparseVectorsFromSequenceFiles
>
> Now referring below link to  generate a list of most similar documents
>
>
>
>
http://mail-archives.apache.org/mod_mbox/mahout-user/201007.mbox/%3C4C2E3EED
> [email protected]%3e
>
>
>
> How can I use RowSimilarityJob to generate list of similar documents  .
>
>
>
> <ol>
>
>   *<li>-Dmapred.input.dir=(path): Directory containing a {...@link
> DistributedRowMatrix} as a
>
>   * SequenceFile<IntWritable,VectorWritable></li>
>
>   *<li>-Dmapred.output.dir=(path): output path where the computations
output
> should go (a {...@link DistributedRowMatrix}
>
>   * stored as a SequenceFile<IntWritable,VectorWritable>)</li>
>
>   *<li>--numberOfColumns: the number of columns in the input matrix</li>
>
>   *<li>--similarityClassname (classname): an implementation of {...@link
> DistributedVectorSimilarity} used to compute the
>
>   * similarity</li>
>
>   *<li>--maxSimilaritiesPerRow (integer): cap the number of similar rows
per
> row to this number (100)</li>
>
>   *</ol>
>
>   *
>
>
>
> Which argument should I pass numberOfColumns and similarityClassname ?
>
>
>
>
>
> Regards,
>
> Divya
>
>
>    


Reply via email to