Just my two cents,
I think what he meant by "single field" is the
following:

If the concept of "field" was introduced to
differentiate the significance of term occurrences in
difference regions of a document, (eg, the occurence
in title is more important than in body, etc), that
significance can be alternatively represented be (or
at least encoded in) "impact" which is per occurence
of a term.
For example, if a base "impact" value for a term
occurence is 1.0, you can assign an additional "0.5"
to the occurence in title, thus you would have the a
impace of "1.5" for the title occurrence of that term,
while "1.0" for the body occurrence.

Does this make sense to you?

I feel that people should take a look at the
theoretical retrieval model first. It is not clear to
me that Lucene is fully vector-space model. It seems
that so much assumption has been made about the
context of the discussion.

Michael

--- jian chen <[EMAIL PROTECTED]> wrote:

> Hi, Jeff,
> 
> I like the idea of impact based scoring. However,
> could you elaborate more
> on why we only need to use single field at search 
> time?
> 
> In Lucene, the indexed terms are field specific, and
> two terms, even if they
> are the same, are still different terms if they are
> of different fields.
> 
> So,  I think the multiple field scenario is still
> needed right? What if the
> user wants to search on both subject and content for
> emails, for example,
> and sometimes, only wants to search on subject, this
> type of tasks, without
> multiple fields, how this would be handled.
> 
> I got lost on this,  could any one educate?
> 
> Thanks,
> 
> Jian
> 
> On 1/9/07, Dalton, Jeffery <[EMAIL PROTECTED]>
> wrote:
> >
> > I'm not sure we fully understand one another, but
> I'll try to explain
> > what I am thinking.
> >
> > Yes, it has use after sorting.  It is used at
> query time for document
> > scoring in place of the TF and length norm
> components  (new scorers
> > would need to be created).
> >
> > Using an impact based index moves most of the
> scoring from query time to
> > index time (trades query time flexibility for
> greatly improved query
> > search performance).  Because the field boosts,
> length norm, position
> > boosts, etc... are incorporated into a single
> document-term-score, you
> > can use a single field at search time.  It allows
> one posting list per
> > query term instead of the current one posting list
> per field per query
> > term (MultiFieldQueryParser wouldn't be necessary
> in most cases).  In
> > addition to having fewer posting lists to examine,
> you often don't need
> > to read to the end of long posting lists when
> processing with a
> > score-at-a-time approach (see Anh/Moffat's Pruned
> Query Evaluation Using
> > Pre-Computed Impacts, SIGIR 2006) for details on
> one potential
> > algorithm.
> >
> > I'm not quite sure what you mean when mention
> leaving them out and
> > re-calculating them at merge time.
> >
> > - Jeff
> >
> > > -----Original Message-----
> > > From: Marvin Humphrey
> [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, January 09, 2007 2:58 PM
> > > To: java-dev@lucene.apache.org
> > > Subject: Re: Beyond Lucene 2.0 Index Design
> > >
> > >
> > > On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery
> wrote:
> > >
> > > > e. <impact, num_docs, (doc1,...docN)>
> > > > f. <impact, num_docs, ([doc1, freq
> ,<positions>],...[docN, freq
> > > > ,<positions>])
> > >
> > > Does the impact have any use after it's used to
> sort the postings?
> > > Can we leave it out of the index format and
> recalculate at merge-time?
> > >
> > > Marvin Humphrey
> > > Rectangular Research
> > > http://www.rectangular.com/
> > >
> > >
> > >
> > >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > >
> > >
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> >
> 



 
____________________________________________________________________________________
Need a quick answer? Get one in minutes from people who know.
Ask your question on www.Answers.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to