Re: Beyond Lucene 2.0 Index Design

jian chen Wed, 10 Jan 2007 14:12:29 -0800

Hi, Jeff,

I like the idea of impact based scoring. However, could you elaborate more
on why we only need to use single field at search  time?


In Lucene, the indexed terms are field specific, and two terms, even if they
are the same, are still different terms if they are of different fields.

So,  I think the multiple field scenario is still needed right? What if the
user wants to search on both subject and content for emails, for example,
and sometimes, only wants to search on subject, this type of tasks, without
multiple fields, how this would be handled.

I got lost on this,  could any one educate?

Thanks,

Jian

On 1/9/07, Dalton, Jeffery <[EMAIL PROTECTED]> wrote:

I'm not sure we fully understand one another, but I'll try to explain
what I am thinking.

Yes, it has use after sorting.  It is used at query time for document
scoring in place of the TF and length norm components  (new scorers
would need to be created).

Using an impact based index moves most of the scoring from query time to
index time (trades query time flexibility for greatly improved query
search performance).  Because the field boosts, length norm, position
boosts, etc... are incorporated into a single document-term-score, you
can use a single field at search time.  It allows one posting list per
query term instead of the current one posting list per field per query
term (MultiFieldQueryParser wouldn't be necessary in most cases).  In
addition to having fewer posting lists to examine, you often don't need
to read to the end of long posting lists when processing with a
score-at-a-time approach (see Anh/Moffat's Pruned Query Evaluation Using
Pre-Computed Impacts, SIGIR 2006) for details on one potential
algorithm.

I'm not quite sure what you mean when mention leaving them out and
re-calculating them at merge time.

- Jeff

> -----Original Message-----
> From: Marvin Humphrey [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 09, 2007 2:58 PM
> To: [email protected]
> Subject: Re: Beyond Lucene 2.0 Index Design
>
>
> On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery wrote:
>
> > e. <impact, num_docs, (doc1,...docN)>
> > f. <impact, num_docs, ([doc1, freq ,<positions>],...[docN, freq
> > ,<positions>])
>
> Does the impact have any use after it's used to sort the postings?
> Can we leave it out of the index format and recalculate at merge-time?
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Beyond Lucene 2.0 Index Design

Reply via email to