RE: Beyond Lucene 2.0 Index Design

Dalton, Jeffery Fri, 12 Jan 2007 05:45:54 -0800

The reason is performance on large collections.  The common case is that
users don't care what field they are searching -- they just want the
most relevant results, fast!  If you need to restrict querying to a
certain field only (subject, URL, etc...) you still need to index that.
However, for many applications you can get by with the one "body" field
with the field boosts integrated into to the impact term score.


The short answer is, if there are scenarios where you need to do it:
consider doing both. 

> -----Original Message-----
> From: jian chen [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, January 10, 2007 5:12 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Beyond Lucene 2.0 Index Design
> 
> Hi, Jeff,
> 
> I like the idea of impact based scoring. However, could you 
> elaborate more on why we only need to use single field at 
> search  time?
> 
> In Lucene, the indexed terms are field specific, and two 
> terms, even if they are the same, are still different terms 
> if they are of different fields.
> 
> So,  I think the multiple field scenario is still needed 
> right? What if the user wants to search on both subject and 
> content for emails, for example, and sometimes, only wants to 
> search on subject, this type of tasks, without multiple 
> fields, how this would be handled.
> 
> I got lost on this,  could any one educate?
> 
> Thanks,
> 
> Jian
> 
> On 1/9/07, Dalton, Jeffery <[EMAIL PROTECTED]> wrote:
> >
> > I'm not sure we fully understand one another, but I'll try 
> to explain 
> > what I am thinking.
> >
> > Yes, it has use after sorting.  It is used at query time 
> for document 
> > scoring in place of the TF and length norm components  (new scorers 
> > would need to be created).
> >
> > Using an impact based index moves most of the scoring from 
> query time 
> > to index time (trades query time flexibility for greatly improved 
> > query search performance).  Because the field boosts, length norm, 
> > position boosts, etc... are incorporated into a single 
> > document-term-score, you can use a single field at search time.  It 
> > allows one posting list per query term instead of the current one 
> > posting list per field per query term 
> (MultiFieldQueryParser wouldn't 
> > be necessary in most cases).  In addition to having fewer posting 
> > lists to examine, you often don't need to read to the end of long 
> > posting lists when processing with a score-at-a-time approach (see 
> > Anh/Moffat's Pruned Query Evaluation Using Pre-Computed 
> Impacts, SIGIR 
> > 2006) for details on one potential algorithm.
> >
> > I'm not quite sure what you mean when mention leaving them out and 
> > re-calculating them at merge time.
> >
> > - Jeff
> >
> > > -----Original Message-----
> > > From: Marvin Humphrey [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, January 09, 2007 2:58 PM
> > > To: java-dev@lucene.apache.org
> > > Subject: Re: Beyond Lucene 2.0 Index Design
> > >
> > >
> > > On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery wrote:
> > >
> > > > e. <impact, num_docs, (doc1,...docN)> f. <impact, num_docs, 
> > > > ([doc1, freq ,<positions>],...[docN, freq
> > > > ,<positions>])
> > >
> > > Does the impact have any use after it's used to sort the postings?
> > > Can we leave it out of the index format and recalculate 
> at merge-time?
> > >
> > > Marvin Humphrey
> > > Rectangular Research
> > > http://www.rectangular.com/
> > >
> > >
> > >
> > > 
> --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Beyond Lucene 2.0 Index Design

Reply via email to