Hi, Jeff, I like the idea of impact based scoring. However, could you elaborate more on why we only need to use single field at search time?
In Lucene, the indexed terms are field specific, and two terms, even if they are the same, are still different terms if they are of different fields. So, I think the multiple field scenario is still needed right? What if the user wants to search on both subject and content for emails, for example, and sometimes, only wants to search on subject, this type of tasks, without multiple fields, how this would be handled. I got lost on this, could any one educate? Thanks, Jian On 1/9/07, Dalton, Jeffery <[EMAIL PROTECTED]> wrote:
I'm not sure we fully understand one another, but I'll try to explain what I am thinking. Yes, it has use after sorting. It is used at query time for document scoring in place of the TF and length norm components (new scorers would need to be created). Using an impact based index moves most of the scoring from query time to index time (trades query time flexibility for greatly improved query search performance). Because the field boosts, length norm, position boosts, etc... are incorporated into a single document-term-score, you can use a single field at search time. It allows one posting list per query term instead of the current one posting list per field per query term (MultiFieldQueryParser wouldn't be necessary in most cases). In addition to having fewer posting lists to examine, you often don't need to read to the end of long posting lists when processing with a score-at-a-time approach (see Anh/Moffat's Pruned Query Evaluation Using Pre-Computed Impacts, SIGIR 2006) for details on one potential algorithm. I'm not quite sure what you mean when mention leaving them out and re-calculating them at merge time. - Jeff > -----Original Message----- > From: Marvin Humphrey [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 09, 2007 2:58 PM > To: java-dev@lucene.apache.org > Subject: Re: Beyond Lucene 2.0 Index Design > > > On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery wrote: > > > e. <impact, num_docs, (doc1,...docN)> > > f. <impact, num_docs, ([doc1, freq ,<positions>],...[docN, freq > > ,<positions>]) > > Does the impact have any use after it's used to sort the postings? > Can we leave it out of the index format and recalculate at merge-time? > > Marvin Humphrey > Rectangular Research > http://www.rectangular.com/ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]