Thank you, Robert. --- On Wed, 2/10/10, Robert Muir <rcm...@gmail.com> wrote:
> From: Robert Muir <rcm...@gmail.com> > Subject: Re: TREC Data and Topic-Specific Index > To: java-user@lucene.apache.org > Date: Wednesday, February 10, 2010, 9:23 AM > Hi, so you mean around 15% and 24% > respectively? i think you could fairly > say either of these is an improvement over your baseline of > 0.141 > > what i mean by large difference, is while I think its safe > to say that using > either of these methods improves over your baseline, i am > not sure you can > conclude that either improvement is better than the other, > > you can apply various statistical tests to try to figure > this out, but > because you didn't participate in the pool with these runs, > you would have > to be careful about drawing conclusions as to which > similarity is best, as > there is some bias and error involved. > > On Wed, Feb 10, 2010 at 9:14 AM, Ivan Provalov <iprov...@yahoo.com> > wrote: > > > Robert, > > > > Thank you for your reply. What would be > considered a large difference? We > > started applying the Sweet Spot Similarity. It > gives us an improvement of > > 0.163-0.141=0.022 MAP so far. LnbLtcSimilarity > gets us more improvement: > > 0.175-0.141=0.034. > > > > Thanks, > > > > Ivan > > > > --- On Sun, 2/7/10, Robert Muir <rcm...@gmail.com> > wrote: > > > > > From: Robert Muir <rcm...@gmail.com> > > > Subject: Re: TREC Data and Topic-Specific Index > > > To: java-user@lucene.apache.org > > > Date: Sunday, February 7, 2010, 10:59 PM > > > you should do (a), and pretend you > > > know nothing about the relevance > > > judgements up front. > > > > > > it is true you might make some change to your > search engine > > > and wonder, how > > > is it fair that I am bringing back possibly > relevant docs > > > that were never > > > judged (and thus scored implicitly as > non-relevant)? i.e. > > > the test > > > collection is biased against you because you did > not > > > participate in the > > > pooling process. > > > > > > if you are concerned about this, you should still > use (a), > > > but perhaps look > > > at other measures such as bpref ( > > > > > http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/sigirBuckley2004.pdf<http://comminfo.rutgers.edu/%7Emuresan/IR/Docs/Articles/sigirBuckley2004.pdf> > > ). > > > > > > personally, I simply prefer to stick with MAP. > And with all > > > measures, > > > whether you look at bpref or map, my advice is to > only > > > consider large > > > differences only when evaluating some potential > > > improvement! > > > > > > On Sun, Feb 7, 2010 at 6:49 PM, Ivan Provalov > <iprov...@yahoo.com> > > > wrote: > > > > > > > Robert, > > > > > > > > We are using TREC-3 data and Ad Hoc topics > > > 151-200. The relevance > > > > judgments list contains 97,319 entries, of > which > > > 68,559 are unique document > > > > ids. The TIPSTER collection which was > used in > > > TREC-3 is around 750,000 > > > > documents. > > > > > > > > Should we (a) index the entire 750,000 > document > > > collection, or (b) the > > > > document collection of the 68,559 unique > documents > > > listed in the qrels, or > > > > (c) should we limit our index to each > specific topic > > > (about 2,000 docs) i.e. > > > > to the documents listed for a particular > topic in the > > > qrels? > > > > > > > > Thanks, > > > > > > > > Ivan > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > -- > > > Robert Muir > > > rcm...@gmail.com > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > -- > Robert Muir > rcm...@gmail.com > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org