Op Saturday 09 February 2008 01:59:12 schreef Panos Konstantinidis:
> Hello I am a new lucene user. I am trying to calculate the recall/precision of
> a query and I was wondering if lucene provides an easy way to do it. 
> 
> Currently I have a number of documents that match a given query. Then I am
> doing a search and I am getting back all the Hits. I then divide the number of
> documents that came back from lucene (the Hits size) with the number of
> documents that should have got. This is how I calculate the recall.

Since you're going to use all hits for the query, it is normally better to avoid
Hits and use a HitCollector or a TopDocs.
 
> For precision I just get the hits.score() of each relevant document. I am not
> sure if I am on the right track or if there is an easier/better way to do it. 
> I
> would appreciate any insigith into this.

To use the score value for precision one could define a cut off value for
the score value, but then the calculation for recall would also need to
be adapted. For this a HitCollector would be good.

In case you want the results sorted by decreasing score value have
a look at the search methods that return TopDocs. From this one
can make a precision/recall graph for the query by considering
the total results higher than a given score.

When a lot of such computations are needed, you may also want
to cache the values of a unique identifier field for all indexed docs,
have a look at FieldCache for this.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to