Thanks much for the reply. Paul Elschot <[EMAIL PROTECTED]> wrote:On Monday 20 December 2004 15:09, Gururaja H wrote: > Hi, > > But, How to calculate the coord() fraction ? I know by default, > in DefaultSimilarity the coord() fraction is defined as below: > > /** Implemented as overlap / maxOverlap. */ > > public float coord(int overlap, int maxOverlap) { > > return overlap / (float)maxOverlap; > > } > How to get the overlap and maxOverlap value in each of the matched document(s) ?
In case you only want the coordination factor to have more influence in the order of your search results you can use a Similarity with a coord() function that has a power higher than 1: public float coord(int overlap, int maxOverlap) { return (float) Math.pow((overlap / (float)maxOverlap), SOME_POWER); } I'd first try values between 3.0f and 5.0f for SOME_POWER. The searching code precomputes all coord values once per query per search, so there is no need to worry about the computing efficiency. This has the advantage that the other scoring factors are still used for ranking. Since the other factors can vary quite a bit, it is difficult to guarantee that any coord() implementation will provide a score that sorts by the number of matching clauses. Higher powers as above can come a long way, though. Regards, Paul Elschot > Thanks, > Gururaja > > Mike Snare wrote: > I'm still new to Lucene, but wouldn't that be the coord()? My > understanding is that the coord() is the fraction of the boolean query > that matched a given document. > > Again, I'm new, so somebody else will have to confirm or deny... > > -Mike > > > On Mon, 20 Dec 2004 00:33:21 -0800 (PST), Gururaja H > wrote: > > How to find out the percentages of matched terms in the document(s) using Lucene ? > > Here is an example, of what i am trying to do: > > The search query has 5 terms(ibm, risc, tape, dirve, manual) and there are 4 matching > > documents with the following attributes: > > Doc#1: contains terms(ibm,drive) > > Doc#2: contains terms(ibm,risc, tape, drive) > > Doc#3: contains terms(ibm,risc, tape,drive) > > Doc#4: contains terms(ibm, risc, tape, drive, manual). > > The percentages displayed would be 100%(Doc#4), 80%(doc#2), 80%(doc#3) and 40% > > (doc#1). > > > > Any help on how to go about doing this ? > > > > Thanks, > > Gururaja > > > > > > --------------------------------- > > Do you Yahoo!? > > Send a seasonal email greeting and help others. Do good. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------- > Do you Yahoo!? > All your favorites on one personal page – Try My Yahoo! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------- Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less.