Hi Joaquin, Check this: http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg05504.html
Otis --- Joaquin Delgado <[EMAIL PROTECTED]> wrote: > Is there any proposal to add a proper NEAR (proximity) operator to > the > default query language that can handle phrase proximity, implemented > as > SpanNearQuery? > > With all the conversations about density queries and searching for > "concepts" that appear in different fields, it just seems logical to > treat exact phrases as single terms when the users' explicitly decide > to > use quotes along with unquoted terms. > > J.D. > > -----Original Message----- > From: Chuck Williams [mailto:[EMAIL PROTECTED] > Sent: Monday, January 31, 2005 6:20 PM > To: Lucene Developers List > Subject: RE: URL to compare 2 Similarity's ready-- Re: Scoring > benchmark > evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher > problems with Similarity.docFreq() ? > > Doug Cutting wrote: > > What did you think of my DensityPhraseQuery proposal? > > It is a step in the direction of what I have in mind, but I'd like to > go > further. How about a query class with these properties: > 1. Inputs are: > a. F = list of fields > b. B = list of field boosts (1:1 correspondence with F) > c. T = list of terms or phrases, each either optional or > required > d. P = proximity-sloping window > 2. Generate matches that contain every required T in some F, and > if > no required T's then at least one optional T if some F. > 3. Score matches based on these considerations: > a. Normal TermQuery and PhraseQuery scores for individual > matches > in individual fields. > b. Boost scores for proximity of TermQuery and PhraseQuery > matches in individual fields, based on some function of P (term > proximity). > c. Boost scores based on number of optional T's matched in at > least one F (term diversity). > > I think that meets all the objectives of my earlier posts. I'd like > to > have it, and would be happy to contribute it if it sounds like the > right > thing. > > Is there a better way? > > > If field boosting needs to then trump idf, we should be able to > deal > > with that when we subsequently tune field boosting, no? We can, > e.g., > > square the field boosts if we need. > > Perhaps, but that seems to me to be a hack on top of a hack. Current > literature seems to consistently not square idf -- I found one > reference > that specifically says even Salton removed the squaring after he > first > proposed it a long time ago. The simpler solution is just to remove > the > squaring. > > Chuck > > > -----Original Message----- > > From: Doug Cutting [mailto:[EMAIL PROTECTED] > > Sent: Monday, January 31, 2005 3:04 PM > > To: Lucene Developers List > > Subject: Re: URL to compare 2 Similarity's ready-- Re: Scoring > benchmark > > evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher > > problems with Similarity.docFreq() ? > > > > Chuck Williams wrote: > > > That expansion is scalable, but it only accounts for proximity > of > all > > > query terms together. E.g., it does not favor a match where t1 > and t2 > > > are close together while t3 is distant over a match where all 3 > terms > > > are distant. Worse, it would not favor a match with t1 and t2 > in > a > > > short title, and t2 and t3 proximal in the content (with no > occurrence > > > of t1 in the content) vs. a match with t1 and t2 in the title > and > t2 > > and > > > t3 distant in the content. > > > > Right. I just mentioned this same weakness in a message replying > to > > David. > > > > > > Is that distinct from my goal to develop an improved > > > > MultiFieldQueryParser for Lucene 2.0? > > > > > > Not distinct, but I think the first step is to decide on the > expansion > > > we want. Unless somebody has a better idea, I think the best > solution > > > is a new Query class that simultaneously supports multiple > fields, > > term > > > diversity and term proximity. It would be similar to > SpansQuery, > but > > > generalized. It would be like BooleanQuery in the sense that > > individual > > > query clauses could be required or not. Then, default AND > could > be > > > achieved by expanding queries to all-required. > > > > > > With this new Query class, revised versions of QueryParser and > > > MultiFieldQuery parser would generate it. > > > > > > Am I way off-base somewhere and/or is there a simpler approach > to > the > > > same end? > > > > It just sounds like a lot to bite off at once. > > > > What did you think of my DensityPhraseQuery proposal? We could > use > this > > in place of a PhraseQuery w/ slop=infinity. We'd need just one > per > > field. > > > > The straight boolean clauses are required for two reasons: > > 1. To make sure that every query term appears in some field; > and > > 2. To reward a term that occurs frequently in a field, but > near > no > > other query terms. > > > > > Sure, idf is important enough to evaluate independently as a > factor. > > > However, I do not think these considerations are orthogonal. > For > > > example, I'm putting a lot of weight in field boosting and > don't > want > > > the preference of title matches over body matches to be > overwhelmed by > > > the idf's. > > > > If field boosting needs to then trump idf, we should be able to > deal > > with that when we subsequently tune field boosting, no? We can, > e.g., > > square the field boosts if we need. > > > > Doug > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]