Doug Cutting wrote:
  > What did you think of my DensityPhraseQuery proposal?

It is a step in the direction of what I have in mind, but I'd like to go
further.  How about a query class with these properties:
  1.  Inputs are:
      a.  F = list of fields
      b.  B = list of field boosts (1:1 correspondence with F)
      c.  T = list of terms or phrases, each either optional or required
      d.  P = proximity-sloping window
  2.  Generate matches that contain every required T in some F, and if
no required T's then at least one optional T if some F.
  3.  Score matches based on these considerations:
      a.  Normal TermQuery and PhraseQuery scores for individual matches
in individual fields.
      b.  Boost scores for proximity of TermQuery and PhraseQuery
matches in individual fields, based on some function of P (term
      c.  Boost scores based on number of optional T's matched in at
least one F (term diversity).

I think that meets all the objectives of my earlier posts.  I'd like to
have it, and would be happy to contribute it if it sounds like the right

Is there a better way?

  > If field boosting needs to then trump idf, we should be able to deal
  > with that when we subsequently tune field boosting, no?  We can,
  > square the field boosts if we need.

Perhaps, but that seems to me to be a hack on top of a hack.  Current
literature seems to consistently not square idf -- I found one reference
that specifically says even Salton removed the squaring after he first
proposed it a long time ago.  The simpler solution is just to remove the


  > -----Original Message-----
  > From: Doug Cutting [mailto:[EMAIL PROTECTED]
  > Sent: Monday, January 31, 2005 3:04 PM
  > To: Lucene Developers List
  > Subject: Re: URL to compare 2 Similarity's ready-- Re: Scoring
  > evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher
  > problems with Similarity.docFreq() ?
  > Chuck Williams wrote:
  > > That expansion is scalable, but it only accounts for proximity of
  > > query terms together.  E.g., it does not favor a match where t1
and t2
  > > are close together while t3 is distant over a match where all 3
  > > are distant.  Worse, it would not favor a match with t1 and t2 in
  > > short title, and t2 and t3 proximal in the content (with no
  > > of t1 in the content) vs. a match with t1 and t2 in the title and
  > and
  > > t3 distant in the content.
  > Right.  I just mentioned this same weakness in a message replying to
  > David.
  > >   > Is that distinct from my goal to develop an improved
  > >   > MultiFieldQueryParser for Lucene 2.0?
  > >
  > > Not distinct, but I think the first step is to decide on the
  > > we want.  Unless somebody has a better idea, I think the best
  > > is a new Query class that simultaneously supports multiple fields,
  > term
  > > diversity and term proximity.  It would be similar to SpansQuery,
  > > generalized.  It would be like BooleanQuery in the sense that
  > individual
  > > query clauses could be required or not.  Then, default AND could
  > > achieved by expanding queries to all-required.
  > >
  > > With this new Query class, revised versions of QueryParser and
  > > MultiFieldQuery parser would generate it.
  > >
  > > Am I way off-base somewhere and/or is there a simpler approach to
  > > same end?
  > It just sounds like a lot to bite off at once.
  > What did you think of my DensityPhraseQuery proposal?  We could use
  > in place of a PhraseQuery w/ slop=infinity.  We'd need just one per
  > field.
  > The straight boolean clauses are required for two reasons:
  >    1. To make sure that every query term appears in some field; and
  >    2. To reward a term that occurs frequently in a field, but near
  > other query terms.
  > > Sure, idf is important enough to evaluate independently as a
  > > However, I do not think these considerations are orthogonal.  For
  > > example, I'm putting a lot of weight in field boosting and don't
  > > the preference of title matches over body matches to be
overwhelmed by
  > > the idf's.
  > If field boosting needs to then trump idf, we should be able to deal
  > with that when we subsequently tune field boosting, no?  We can,
  > square the field boosts if we need.
  > Doug
  > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to