Chuck Williams wrote:
That expansion is scalable, but it only accounts for proximity of all
query terms together.  E.g., it does not favor a match where t1 and t2
are close together while t3 is distant over a match where all 3 terms
are distant.  Worse, it would not favor a match with t1 and t2 in a
short title, and t2 and t3 proximal in the content (with no occurrence
of t1 in the content) vs. a match with t1 and t2 in the title and t2 and
t3 distant in the content.

Right. I just mentioned this same weakness in a message replying to David.

  > Is that distinct from my goal to develop an improved
  > MultiFieldQueryParser for Lucene 2.0?

Not distinct, but I think the first step is to decide on the expansion
we want.  Unless somebody has a better idea, I think the best solution
is a new Query class that simultaneously supports multiple fields, term
diversity and term proximity.  It would be similar to SpansQuery, but
generalized.  It would be like BooleanQuery in the sense that individual
query clauses could be required or not.  Then, default AND could be
achieved by expanding queries to all-required.

With this new Query class, revised versions of QueryParser and
MultiFieldQuery parser would generate it.

Am I way off-base somewhere and/or is there a simpler approach to the
same end?

It just sounds like a lot to bite off at once.

What did you think of my DensityPhraseQuery proposal? We could use this in place of a PhraseQuery w/ slop=infinity. We'd need just one per field.

The straight boolean clauses are required for two reasons:
1. To make sure that every query term appears in some field; and
2. To reward a term that occurs frequently in a field, but near no other query terms.


Sure, idf is important enough to evaluate independently as a factor.
However, I do not think these considerations are orthogonal.  For
example, I'm putting a lot of weight in field boosting and don't want
the preference of title matches over body matches to be overwhelmed by
the idf's.

If field boosting needs to then trump idf, we should be able to deal with that when we subsequently tune field boosting, no? We can, e.g., square the field boosts if we need.


Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to