That expansion is scalable, but it only accounts for proximity of all query terms together. E.g., it does not favor a match where t1 and t2 are close together while t3 is distant over a match where all 3 terms are distant. Worse, it would not favor a match with t1 and t2 in a short title, and t2 and t3 proximal in the content (with no occurrence of t1 in the content) vs. a match with t1 and t2 in the title and t2 and t3 distant in the content.
Right. I just mentioned this same weakness in a message replying to David.
> Is that distinct from my goal to develop an improved > MultiFieldQueryParser for Lucene 2.0?
Not distinct, but I think the first step is to decide on the expansion we want. Unless somebody has a better idea, I think the best solution is a new Query class that simultaneously supports multiple fields, term diversity and term proximity. It would be similar to SpansQuery, but generalized. It would be like BooleanQuery in the sense that individual query clauses could be required or not. Then, default AND could be achieved by expanding queries to all-required.
With this new Query class, revised versions of QueryParser and MultiFieldQuery parser would generate it.
Am I way off-base somewhere and/or is there a simpler approach to the same end?
It just sounds like a lot to bite off at once.
What did you think of my DensityPhraseQuery proposal? We could use this in place of a PhraseQuery w/ slop=infinity. We'd need just one per field.
The straight boolean clauses are required for two reasons:
1. To make sure that every query term appears in some field; and
2. To reward a term that occurs frequently in a field, but near no other query terms.
Sure, idf is important enough to evaluate independently as a factor. However, I do not think these considerations are orthogonal. For example, I'm putting a lot of weight in field boosting and don't want the preference of title matches over body matches to be overwhelmed by the idf's.
If field boosting needs to then trump idf, we should be able to deal with that when we subsequently tune field boosting, no? We can, e.g., square the field boosts if we need.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]