Doug Cutting wrote: > What did you think of my DensityPhraseQuery proposal?
It is a step in the direction of what I have in mind, but I'd like to go further. How about a query class with these properties: 1. Inputs are: a. F = list of fields b. B = list of field boosts (1:1 correspondence with F) c. T = list of terms or phrases, each either optional or required d. P = proximity-sloping window 2. Generate matches that contain every required T in some F, and if no required T's then at least one optional T if some F. 3. Score matches based on these considerations: a. Normal TermQuery and PhraseQuery scores for individual matches in individual fields. b. Boost scores for proximity of TermQuery and PhraseQuery matches in individual fields, based on some function of P (term proximity). c. Boost scores based on number of optional T's matched in at least one F (term diversity).
That's a lot of functionality bundled into a single Query class! I'd rather make it possible to assemble this from reusable parts. And it almost can be already. Then we can offer such a thing pre-packaged.
So let me take it point-by-point:
1a-c is the new MultiFieldQueryParser implementation. 1d is Similarity.sloppyFreq() 2 is BooleanQuery (except the weird optional stuff) 3a is TermQuery and PhraseQuery 3b is DensityPhraseQuery (to be implemented) 3c is Similarity.coord()
So I think this can be implemented using the expansion I proposed yesterday for MultiFieldQueryParser, plus something like my DensityPhraseQuery and perhaps a few Similarity tweaks.
> If field boosting needs to then trump idf, we should be able to deal > with that when we subsequently tune field boosting, no? We can, e.g., > square the field boosts if we need.
Perhaps, but that seems to me to be a hack on top of a hack. Current literature seems to consistently not square idf -- I found one reference that specifically says even Salton removed the squaring after he first proposed it a long time ago. The simpler solution is just to remove the squaring.
I wasn't arguing that we shouldn't alter the idf definition. Precisely the opposite in fact. If squaring idf is bad, then that should show up in single-field search and we can adjust it in that context. You had claimed that good idf formulation is confounded with multi-field search. I do not believe that and that's what I was speaking to. The Salton work you cite is all single-field stuff.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]