[ https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516547 ]
Doron Cohen commented on LUCENE-965: ------------------------------------ > Is there a way to plug in a patch into my local source repository, so I can > diff with my favorite diff tool? : patch -p 0 < foo.patch Try with --dry-run first. Another convenient way in case you are using Eclipse is the Subclipse plugin that lets you visually diff patches just before applying them. > But may I suggest the alternative? I think you have a valid point here. I too don't understand the proposed "Axiomatic Retrieval Function" (ARF) in this regard: in Lucene, the norm value stored for a document (assuming all boosts are 1) is norm(D) = 1 / sqrt(numTerms(D)) This value is ready to use at scoring time, multiplying it with tf(t in d) - idf(t)^^2 as described in http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html Now, the ARF paper in http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf describes Lucene scoring using |D| in place of norm(D) above, and describes ARF scoring using |D| again, the same as it seems to be implemented in this patch e.g. in TermScorer. However, the paper defines |D| as "the length of D". I find this confusing. Usually "|D|" really means the number of words in a document, and "avgdl" would mean the average of all |D|'s in the collection (see for instance "Okapi BM25" in Wikipedia). Now, your proposed change is something I can understand - it first translates back norm(D) into Length(D) (ignoring boosts), and only then averaging it. In any case - I mean if either this is fixed or I am wrong and an explanation shows why no fix is needed - I have to admit I still don't understand the logic behind ARF, intuitively, why would it be better? Guess provable search quality results can help in persuading... (LUCENE-836 is resolved btw). > Implement a state-of-the-art retrieval function in Lucene > --------------------------------------------------------- > > Key: LUCENE-965 > URL: https://issues.apache.org/jira/browse/LUCENE-965 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.2 > Reporter: Hui Fang > Attachments: axiomaticFunction.patch > > > We implemented the axiomatic retrieval function, which is a state-of-the-art > retrieval function, to > replace the default similarity function in Lucene. We compared the > performance of these two functions and reported the results at > http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. > The report shows that the performance of the axiomatic retrieval function is > much better than the default function. The axiomatic retrieval function is > able to find more relevant documents and users can see more relevant > documents in the top-ranked documents. Incorporating such a state-of-the-art > retrieval function could improve the search performance of all the > applications which were built upon Lucene. > Most changes related to the implementation are made in AXSimilarity, > TermScorer and TermQuery.java. However, many test cases are hand coded to > test whether the implementation of the default function is correct. Thus, I > also made the modification to many test files to make the new retrieval > function pass those cases. In fact, we found that some old test cases are not > reasonable. For example, in the testQueries02 of TestBoolean2.java, > the query is "+w3 xx", and we have two documents "w1 xx w2 yy w3" and "w1 w3 > xx w2 yy w3". > The second document should be more relevant than the first one, because it > has more > occurrences of the query term "w3". But the original test case would require > us to rank > the first document higher than the second one, which is not reasonable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]