The MoreLikeThis feature may be exactly what you want. Try it out. On Thu, Dec 30, 2010 at 8:28 AM, Amel Fraisse <amel.frai...@gmail.com> wrote: > Hello, > > No I'm not using cosine similarity metrics. > > > 2010/12/30 Shashi Kant <sk...@sloan.mit.edu> > >> Have you considered using document similarity metrics such as Cosine >> Similarity? >> >> >> On Thu, Dec 30, 2010 at 6:05 AM, Amel Fraisse <amel.frai...@gmail.com> >> wrote: >> > Hello, >> > >> > I am using Lucene for plagiarism detection. >> > >> > The goal is that: when I have a new document, I will check on the solr >> index >> > if there is a document that contain some common chunk. >> > >> > So to compute similarity between the query and a source document I would >> use >> > this formula : >> > >> > Score (suspicious document, source document) = Number of common chunk >> > between source document and suspicious document / Number of total chunk >> in >> > the suspicious document. >> > >> > So I have to change the scoring formula in the Similarity class. >> > >> > How can I change the scoring formula? ( by customizing only the >> Similarity >> > class? or Scorer?) >> > >> > Do you have an Example of this use case? >> > >> > Thank for your help. >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > -------------- > Amel Fraisse >
-- Lance Norskog goks...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org