[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

Michael Busch (JIRA) Thu, 26 Jul 2007 12:38:43 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515840
 ]


Michael Busch commented on LUCENE-965:
--------------------------------------

> Can we store the "document length" (with multiple fields) and "average 
> document length" 
> as the payload data at document level and index level respectively? The 
> current payload 
> is designed at term level, is it right? If we want to store something at 
> document and 
> index level, do we necessary change the Lucene file format? 

You are right, currently we can only store payloads per term occurrence, not at 
the doc
level. However, it is possible to simply add a special term to every document 
that has
only one occurrence with a payload, then you have one payload per doc.

Coincidentally I am currently testing how search performance would suffer if we 
stored
the norms as payloads in the posting lists. At search time we would then not 
buffer the
norms but read them on demand from the prx file. This is probably somewhat 
slower than 
buffering the norms, but has a lot of advantages, such as much simpler code and 
less 
memory consumption by the IndexReader. Since all norms are then stored in a 
single 
posting lists I'm hoping that the FS cache will help that the search 
performance won't
suffer too much. And multi-level skipping should help too. Well let's see, I'm 
currently
building an index with norms as payloads, I should have some numbers tonight or 
tomorrow.

> Implement a state-of-the-art retrieval function in Lucene
> ---------------------------------------------------------
>
>                 Key: LUCENE-965
>                 URL: https://issues.apache.org/jira/browse/LUCENE-965
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Hui Fang
>         Attachments: axiomaticFunction.patch
>
>
> We implemented the axiomatic retrieval function, which is a state-of-the-art 
> retrieval function, to 
> replace the default similarity function in Lucene. We compared the 
> performance of these two functions and reported the results at 
> http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. 
> The report shows that the performance of the axiomatic retrieval function is 
> much better than the default function. The axiomatic retrieval function is 
> able to find more relevant documents and users can see more relevant 
> documents in the top-ranked documents. Incorporating such a state-of-the-art 
> retrieval function could improve the search performance of all the 
> applications which were built upon Lucene. 
> Most changes related to the implementation are made in AXSimilarity, 
> TermScorer and TermQuery.java.  However, many test cases are hand coded to 
> test whether the implementation of the default function is correct. Thus, I 
> also made the modification to many test files to make the new retrieval 
> function pass those cases. In fact, we found that some old test cases are not 
> reasonable. For example, in the testQueries02 of TestBoolean2.java, 
> the query is "+w3 xx", and we have two documents "w1 xx w2 yy w3" and "w1 w3 
> xx w2 yy w3". 
> The second document should be more relevant than the first one, because it 
> has more 
> occurrences of the query term "w3". But the original test case would require 
> us to rank 
> the first document higher than the second one, which is not reasonable. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

Reply via email to