[jira] Issue Comment Edited: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

Doron Cohen (JIRA) Wed, 25 Jul 2007 20:30:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515485
 ]


Doron Cohen edited comment on LUCENE-965 at 7/25/07 8:29 PM:
-------------------------------------------------------------

Thanks for contributing this Hui Fang! Very interesting.
I agree with Grant that we should be able to evaluate this in the context of 
LUCENE-836 - I hope to finalize it pretty soon. 
I looked into the patch and read the short paper referenced and I have a few 
comments:

1) Interestingly this too makes use of the average document length, as 
discussed in http://www.gossamer-threads.com/lists/lucene/java-dev/50537
2) The current patch seem out dated comparing to trunk and also contain many 
changes that are not part of the proposed improvement. You need to run "svn 
update" to update with trunk (but do "svn stat -u" beforehand to see what is 
going to be updated and that there are no conflicts, and bkup your code before 
that just in case...)
3) The AXSimilarity class itself was is not included in the patch (note that 
you need to "svn add" the new files in order for "svn diff" to include these 
new files in the patch.
4) On first reading of the patch it seems that the avarage length is computed 
at search time for each scored term... right? This is good enough for the 
evaluation of this Similarity function, but for a running solution a better 
performance method would be required, like the one Hoss suggested in 
http://www.gossamer-threads.com/lists/lucene/java-dev/5053



 was:
Thanks for contributing this Hui Fang! Very interesting.
I agree with Grant that we should be able to asses this in the context of 
LUCENE-836 - I hope to finalize it pretty soon. 
I looked into the patch and read the short paper referenced and I have a few 
comments:

1) Interestingly this too makes use of the average document length, as 
discussed in http://www.gossamer-threads.com/lists/lucene/java-dev/50537
2) The current patch seem out dated comparing to trunk and also contain many 
changes that are not part of the proposed improvement. You need to run "svn 
update" to update with trunk (but do "svn stat -u" beforehand to see what is 
going to be updated and that there are no conflicts, and bkup your code before 
that just in case...)
3) The AXSimilarity class itself was is not included in the patch (note that 
you need to "svn add" the new files in order for "svn diff" to include these 
new files in the patch.
4) On first reading of the patch it seems that the avarage length is computed 
at search time for each scored term... right? This is good enough for the 
evaluation of this Similarity function, but for a running solution a better 
performance method would be required, like the one Hoss suggested in 
http://www.gossamer-threads.com/lists/lucene/java-dev/5053


> Implement a state-of-the-art retrieval function in Lucene
> ---------------------------------------------------------
>
>                 Key: LUCENE-965
>                 URL: https://issues.apache.org/jira/browse/LUCENE-965
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Hui Fang
>         Attachments: axiomaticFunction.patch
>
>
> We implemented the axiomatic retrieval function, which is a state-of-the-art 
> retrieval function, to 
> replace the default similarity function in Lucene. We compared the 
> performance of these two functions and reported the results at 
> http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. 
> The report shows that the performance of the axiomatic retrieval function is 
> much better than the default function. The axiomatic retrieval function is 
> able to find more relevant documents and users can see more relevant 
> documents in the top-ranked documents. Incorporating such a state-of-the-art 
> retrieval function could improve the search performance of all the 
> applications which were built upon Lucene. 
> Most changes related to the implementation are made in AXSimilarity, 
> TermScorer and TermQuery.java.  However, many test cases are hand coded to 
> test whether the implementation of the default function is correct. Thus, I 
> also made the modification to many test files to make the new retrieval 
> function pass those cases. In fact, we found that some old test cases are not 
> reasonable. For example, in the testQueries02 of TestBoolean2.java, 
> the query is "+w3 xx", and we have two documents "w1 xx w2 yy w3" and "w1 w3 
> xx w2 yy w3". 
> The second document should be more relevant than the first one, because it 
> has more 
> occurrences of the query term "w3". But the original test case would require 
> us to rank 
> the first document higher than the second one, which is not reasonable. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Issue Comment Edited: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

Reply via email to