Right now I have only few documents.. Just wanna know what kind of similarity it generates. As I have no idea on what basis it generates similarity..
-----Original Message----- From: Sebastian Schelter [mailto:[email protected]] Sent: Tuesday, October 26, 2010 2:37 PM To: [email protected] Subject: Re: generate document-document similarity matrix Hi, how many documents do you have and what kind of similarity do you wanna use? --sebastian On 26.10.2010 08:10, Divya wrote: > Hi, > > I am new mahout user and using Mahout 0.4 with eclipse. > > I need to generate document similarity matrix from the vector file which I > have already created using SparseVectorsFromSequenceFiles > > Now I need to generate the document similarity matrix. > > Which gave me > > Directory structure > > -> df-count > > -> tfidf-vectors > > -> tf-vectors > > -> tokenized-documents > > -> wordcount > > -> .dictionary.file-0.crc > > -> .frequency.file-0.crc > > -> dictionary.file-0 > > -> frequency.file-0 > > > > I am confused now which one to use > > Which utility of mahout computes document document similairity matrix. > > > > Can any one help me. > > > > > > Regards, > > Divya > > >
