[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

Grant Ingersoll (JIRA) Wed, 25 Jul 2007 19:55:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515482
 ]


Grant Ingersoll commented on LUCENE-965:
----------------------------------------

What do people make of this?  Interesting claims.  I haven't looked at the 
patch yet or read up on the Axiomatic retrieval model, but the precision 
numbers in the report are impressive.  I think it dovetails nicely with Doron 
and Chris' discussions on retrieval performance and making better efforts to 
gauge Lucene's retrieval effectiveness.  These numbers are for TREC and that 
doesn't always correlate to the real world, but still, not to be discounted, 
either.

I think it would be cool to see a couple things out of this (at least):
1. contrib/benchmark algorithms to be applied for before and after, including 
LUCENE-836.  This would give everyone a way of easily evaluating (assuming they 
have TREC data).  I would wait for 836 to be committed, though, as it is not 
final yet.
2. Search speed numbers comparing the two approaches.  That is if it is 
significantly slower, than it probably isn't going to be the default way of 
doing things

My gut reaction would be, if everything checks out of course, is to see how to 
factor it in as a separate querying mechanism, if possible like the Spans 
functionality, to give people the option of using this and if the claims hold 
up in the wild and feedback is positive, then we could look to making it the 
default approach.  Not sure how feasible this is, though

In the meantime, looks like I've got some reading to do...

Cheers,
Grant

> Implement a state-of-the-art retrieval function in Lucene
> ---------------------------------------------------------
>
>                 Key: LUCENE-965
>                 URL: https://issues.apache.org/jira/browse/LUCENE-965
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Hui Fang
>         Attachments: axiomaticFunction.patch
>
>
> We implemented the axiomatic retrieval function, which is a state-of-the-art 
> retrieval function, to 
> replace the default similarity function in Lucene. We compared the 
> performance of these two functions and reported the results at 
> http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. 
> The report shows that the performance of the axiomatic retrieval function is 
> much better than the default function. The axiomatic retrieval function is 
> able to find more relevant documents and users can see more relevant 
> documents in the top-ranked documents. Incorporating such a state-of-the-art 
> retrieval function could improve the search performance of all the 
> applications which were built upon Lucene. 
> Most changes related to the implementation are made in AXSimilarity, 
> TermScorer and TermQuery.java.  However, many test cases are hand coded to 
> test whether the implementation of the default function is correct. Thus, I 
> also made the modification to many test files to make the new retrieval 
> function pass those cases. In fact, we found that some old test cases are not 
> reasonable. For example, in the testQueries02 of TestBoolean2.java, 
> the query is "+w3 xx", and we have two documents "w1 xx w2 yy w3" and "w1 w3 
> xx w2 yy w3". 
> The second document should be more relevant than the first one, because it 
> has more 
> occurrences of the query term "w3". But the original test case would require 
> us to rank 
> the first document higher than the second one, which is not reasonable. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

Reply via email to