Re: Average Precision - TREC-3

Robert Muir Wed, 27 Jan 2010 11:53:48 -0800

Hi Ivan, it sounds to me like you are going about it the right way.
I too have complained about different document/topic formats before, at
least with non-TREC test collections that claim to be in TREC format.


Here is a description of what I do, for what its worth.

1. if you use the trunk benchmark code, it will now parse Descriptions and
Narratives in addition to Titles. This way you can run TD and TDN queries.
While I think Topic only (T) queries are generally the only interesting
value, as users only typically type a few short words in their search, the
TD and TDN queries are sometimes useful for comparisons. so to do this you
will have to either change SimpleQQParser or make your own, that simply
creates a BooleanQuery of Topic + Description + Narrative or whatever.

2. another thing I usually test with is query expansion with MoreLikeThis,
all defaults, from the top 5 returned docs. I do this with T, TD, and TDN,
for 6 different MAP measures. You can see a recent example where I applied
all 6 measures here: https://issues.apache.org/jira/browse/LUCENE-2234 . I
feel these 6 measures give me a better overall idea of any relative
relevance improvement, look in that example where the unexpanded T is
improved 75%, but the other 5 its only a 40-50% improvement. While
unexpanded T is theoretically the most realistic to me, I feel its a bit
fragile and sensitive, and there's a good example.

<I can contribute code to make it easier to do the above two things if you
think it would be useful, just havent gotten around to it>

3. I don't even bother with the 'summary output' that the lucene benchmark
pkg prints out, but instead simply use the benchmark pkg to run the queries
and generate the trec_top_file (submission.txt), which I hand to trec_eval


On Wed, Jan 27, 2010 at 1:36 PM, Ivan Provalov <iprov...@yahoo.com> wrote:

> Robert, Grant:
>
> Thank you for your replies.
>
> Our goal is to fine-tune our existing system to perform better on
> relevance.
>
> I agree with Robert's comment that these collections are not completely
> compatible.  Yes, it is possible that the results will vary some depending
> on the collections differences.  The reason for us picking TREC-3 TIPSTER
> collection is that our production content overlaps with some TIPSTER
> documents.
>
> Any suggestions on how to obtain Lucene's TREC-3 compatible results, or
> select a better approach would be appreciated.
>
> We are doing this project in three stages:
>
> 1. Test Lucene's "vanilla" performance to establish the baseline.  We want
> to iron out the issues such as topic or document formats.  For example, we
> had to add a different parser and clean up the topic title.  This will give
> us confidence that we are using the data and the methodology correctly.
>
> 2. Fine-tune Lucene based on the latest research findings (TREC by E.
> Voorhees, conference proceedings, etc...).
>
> 3. Repeat these steps with our production system which runs on Lucene.  The
> reason we are doing this step last is to ensure that our overall system
> doesn't introduce the relevance issues (content pre-processing steps, query
> parsing steps, etc...).
>
> Thank you,
>
> Ivan Provalov
>
> --- On Wed, 1/27/10, Robert Muir <rcm...@gmail.com> wrote:
>
> > From: Robert Muir <rcm...@gmail.com>
> > Subject: Re: Average Precision - TREC-3
> > To: java-user@lucene.apache.org
> > Date: Wednesday, January 27, 2010, 11:16 AM
> > Hello, forgive my ignorance here (I
> > have not worked with these english TREC
> > collections), but is the TREC-3 test collection the same as
> > the test
> > collection used in the 2007 paper you referenced?
> >
> > It looks like that is a different collection, its not
> > really possible to
> > compare these relevance scores across different
> > collections.
> >
> > On Wed, Jan 27, 2010 at 11:06 AM, Grant Ingersoll <gsing...@apache.org
> >wrote:
> >
> > >
> > > On Jan 26, 2010, at 8:28 AM, Ivan Provalov wrote:
> > >
> > > > We are looking into making some improvements to
> > relevance ranking of our
> > > search platform based on Lucene.  We started by
> > running the Ad Hoc TREC task
> > > on the TREC-3 data using "out-of-the-box"
> > Lucene.  The reason to run this
> > > old TREC-3 (TIPSTER Disk 1 and Disk 2; topics 151-200)
> > data was that the
> > > content is matching the content of our production
> > system.
> > > >
> > > > We are currently getting average precision of
> > 0.14.  We found some format
> > > issues with the TREC-3 data which were causing even
> > lower score.  For
> > > example, the initial average precision number was
> > 0.9.  We discovered that
> > > the topics included the word "Topic:" in the
> > <title> tag.  For example,
> > > > "<title> Topic:  Coping with
> > overcrowded prisons".  By removing this term
> > > from the queries, we bumped the average precision to
> > 0.14.
> > >
> > > There's usually a lot of this involved in running
> > TREC.  I've also seen a
> > > good deal of improvement from things like using phrase
> > queries and the
> > > Dismax Query Parser in Solr (which uses
> > DisjunctionQuery in Lucene, amongst
> > > other things) and by playing around with length
> > normalization.
> > >
> > >
> > > >
> > > > Our query is based on the title tag of the topic
> > and the index field is
> > > based on the <TEXT> tag of the document.
> > > >
> > > > QualityQueryParser qqParser = new
> > SimpleQQParser("title", "TEXT");
> > > >
> > > > Is there an average precision number which
> > "out-of-the-box" Lucene should
> > > be close to?  For example, this IBM's 2007 TREC
> > paper mentions 0.154:
> > > > http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
> > >
> > > Hard to say.  I can't say I've run TREC 3.
> > You might ask over on the Open
> > > Relevance list too (http://lucene.apache.org/openrelevance).  I know
> > > Robert Muir's done a lot of experiments with Lucene on
> > standard collections
> > > like TREC.
> > >
> > > I guess the bigger question back to you is what is
> > your goal?  Is it to get
> > > better at TREC or to actually tune your system?
> > >
> > > -Grant
> > >
> > >
> > > --------------------------
> > > Grant Ingersoll
> > > http://www.lucidimagination.com/
> > >
> > > Search the Lucene ecosystem using Solr/Lucene:
> > > http://www.lucidimagination.com/search
> > >
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > Robert Muir
> > rcm...@gmail.com
> >
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com

Re: Average Precision - TREC-3

Reply via email to