It's exactly my question: http://www.mail-archive.com/[email protected]/msg04915.html
--- On Mon, 6/29/09, Amir Hossein Jadidinejad <[email protected]> wrote: From: Amir Hossein Jadidinejad <[email protected]> Subject: Doc-Doc Similarity Matrix Construction To: [email protected] Date: Monday, June 29, 2009, 3:14 PM Hi, It's my first experiment with Lucene. Please help me. I'm going to index a set of documents and create a feature vector for each of them. This vector contains all terms belong to the document that weight using TFIDF. After that I want to compute the cosine similarity between all documents and produce a doc-doc similarity matrix. My document set is large and it's important to have a scalable implementation. Would you please provide me a guideline or to-do list? Thank you and kind regards.
