[jira] Updated: (LUCENE-836) Benchmarks Enhancements (precision/recall, TREC, Wikipedia)

Doron Cohen (JIRA) Fri, 29 Jun 2007 14:37:29 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Doron Cohen updated LUCENE-836:
-------------------------------

    Attachment: lucene-836.benchmark.quality.patch

lucene-836.benchmark.quality.patch adds a new package "quality" under 
o.a.l.benchmark. 

This is also followup to some of 
http://www.mail-archive.com/[email protected]/msg10851.html

Patch is based on trunk folder. 
Fastest way to test it: "ant test" from contrib/benchmark dir.
To see more output in this run, try "ant test -Dtests.verbose=true".

This is early code, not ready to commit - wanted to show it sooner for 
feedback, especially the API. 

For a quick view of the API see benchmark.quality at 
http://people.apache.org/~doronc/api (note that not much javadocs yet - I would 
wait with that for API closure.)

Code in this patch is:
  - extendable.
  - can run a quality benchmark.
  - report quality results, comparing to given judgements (optional).
  - create a submission log (optional).
  - format of submission log can be modified, by extending a logger class.
  - format of inputs - queries, judgments - can be modified, by extending 
    default readers, or by providing pre-read ones.

There is a general "Judge" interface - answering if a given doc name is valid 
for a given "QualityQuery". And one implementation of it, based on Trec's 
QRels. The alternative of TRels, for instance, would mean another 
implementation of the "Judge" interface. (I would love a better name for it, 
btw...)

A new TestQualityRun tests this package on the Reuters collection - so that 
test source is a good place to start, to see how to run a quality test.

> Benchmarks Enhancements (precision/recall, TREC, Wikipedia)
> -----------------------------------------------------------
>
>                 Key: LUCENE-836
>                 URL: https://issues.apache.org/jira/browse/LUCENE-836
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Grant Ingersoll
>            Priority: Minor
>         Attachments: lucene-836.benchmark.quality.patch
>
>
> Would be great if the benchmark contrib had a way of providing 
> precision/recall benchmark information ala TREC.  I don't know what the 
> copyright issues are for the TREC queries/data (I think the queries are 
> available, but not sure about the data), so not sure if the is even feasible, 
> but I could imagine we could at least incorporate support for it for those 
> who have access to the data.  It has been a long time since I have 
> participated in TREC, so perhaps someone more familiar w/ the latest can fill 
> in the blanks here.
> Another option is to ask for volunteers to create queries and make judgments 
> for the Reuters data, but that is a bit more complex and probably not 
> necessary.  Even so, an Apache licensed set of benchmarks may be useful for 
> the community as a whole.  Hmmm.... 
> Wikipedia might be another option instead of Reuters to setup as a download 
> for benchmarking, as it is quite large and I believe the licensing terms are 
> quite amenable.  Having a larger collection would be good for stressing 
> Lucene more and would give many users a demonstration of how Lucene handles 
> large collections.
> At any rate, this kind of information could be useful for people looking at 
> different indexing schemes, formats, payloads and different query strategies.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-836) Benchmarks Enhancements (precision/recall, TREC, Wikipedia)

Reply via email to