Op Saturday 09 February 2008 01:59:12 schreef Panos Konstantinidis: > Hello I am a new lucene user. I am trying to calculate the recall/precision of > a query and I was wondering if lucene provides an easy way to do it. > > Currently I have a number of documents that match a given query. Then I am > doing a search and I am getting back all the Hits. I then divide the number of > documents that came back from lucene (the Hits size) with the number of > documents that should have got. This is how I calculate the recall.
Since you're going to use all hits for the query, it is normally better to avoid Hits and use a HitCollector or a TopDocs. > For precision I just get the hits.score() of each relevant document. I am not > sure if I am on the right track or if there is an easier/better way to do it. > I > would appreciate any insigith into this. To use the score value for precision one could define a cut off value for the score value, but then the calculation for recall would also need to be adapted. For this a HitCollector would be good. In case you want the results sorted by decreasing score value have a look at the search methods that return TopDocs. From this one can make a precision/recall graph for the query by considering the total results higher than a given score. When a lot of such computations are needed, you may also want to cache the values of a unique identifier field for all indexed docs, have a look at FieldCache for this. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]