I have a couple of questions about the original post
of the new index design:

(1) Question on the posting list
> > f. <impact, num_docs, ([doc1, freq
> ,<positions>],...[docN, freq
> > > > ,<positions>])
What is the "impact" per posting list? I am under the
impression that "impact" or "frequency" is per pair of
doc and term. 

And it seem that "impact" or "frequency" needs to be
stored for each doc on the posting list of a term. The
reasons are two: To efficiently stop the traversal at
some point at search time by looking at the "impact"
value. And to get the component score without
re-cacalculation at search time.


(2) I wonder whether Lucene is really based upon
vector-space model. I am under the impression that the
hits are selected using boolean model and only the
scoring (on the hit set) uses vector space model.

If so, the effect on boolean queries are not very
positive. 

For a query like "termA AND termB", I suppose the
posting lists of both A and B have to be fully
traversed, right? The partial traversal is only
possible for disjunctions or single term query. And
the join of the two posting lists will be most costly
than on the original docID-sorted posting lists.

(3) As to Jian's question below,
A phrase query is a special case of a conjunctive
boolean query. 

Michael


--- jian chen <[EMAIL PROTECTED]> wrote:

> Hi, Jeff,
> 
> Also, how to handle the phrase based queries?
> 
> For example, here are two posting lists:
> 
> TermA: X Y
> TermB: Y X
> 
> I am not sure how you would return document X or Y
> for a search of the
> phrase "TermA Term B". Which should come first?
> 
> Thanks,
> 
> Jian
> 
> On 1/9/07, Dalton, Jeffery <[EMAIL PROTECTED]>
> wrote:
> >
> > I'm not sure we fully understand one another, but
> I'll try to explain
> > what I am thinking.
> >
> > Yes, it has use after sorting.  It is used at
> query time for document
> > scoring in place of the TF and length norm
> components  (new scorers
> > would need to be created).
> >
> > Using an impact based index moves most of the
> scoring from query time to
> > index time (trades query time flexibility for
> greatly improved query
> > search performance).  Because the field boosts,
> length norm, position
> > boosts, etc... are incorporated into a single
> document-term-score, you
> > can use a single field at search time.  It allows
> one posting list per
> > query term instead of the current one posting list
> per field per query
> > term (MultiFieldQueryParser wouldn't be necessary
> in most cases).  In
> > addition to having fewer posting lists to examine,
> you often don't need
> > to read to the end of long posting lists when
> processing with a
> > score-at-a-time approach (see Anh/Moffat's Pruned
> Query Evaluation Using
> > Pre-Computed Impacts, SIGIR 2006) for details on
> one potential
> > algorithm.
> >
> > I'm not quite sure what you mean when mention
> leaving them out and
> > re-calculating them at merge time.
> >
> > - Jeff
> >
> > > -----Original Message-----
> > > From: Marvin Humphrey
> [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, January 09, 2007 2:58 PM
> > > To: java-dev@lucene.apache.org
> > > Subject: Re: Beyond Lucene 2.0 Index Design
> > >
> > >
> > > On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery
> wrote:
> > >
> > > > e. <impact, num_docs, (doc1,...docN)>
> > > > f. <impact, num_docs, ([doc1, freq
> ,<positions>],...[docN, freq
> > > > ,<positions>])
> > >
> > > Does the impact have any use after it's used to
> sort the postings?
> > > Can we leave it out of the index format and
> recalculate at merge-time?
> > >
> > > Marvin Humphrey
> > > Rectangular Research
> > > http://www.rectangular.com/
> > >
> > >
> > >
> > >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > >
> > >
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> >
> 



 
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to