I think that works. When originally looking for solutions, I hadn't thought of overriding the BooleanQuery.getSimilarity() method selectively. You obviously have more familiarity with these classes :-).
Thanks, Chuck > -----Original Message----- > From: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 13, 2004 12:56 PM > To: Lucene Developers List > Subject: Re: Contribution: better multi-field searching > > Chuck Williams wrote: > > That approach does not work. I could not find an approach that would > > work with the built-in classes, although of course there might be one. > > The problem has two components: coord and the fact that BooleanQuery's > > sum their clause scores to compute the final score. The latter is not > > easily overridden. Specifically, > > > > title:(albino elephant)^4 description:(albino elephant) > > > > still has the problem that a result with albino in the title and albino > > in the description gets the same score as a result with albino in the > > title and elephant in the description > > Perhaps I misunderstood what you desire. You want a reward for albino > and elephant both occurring in the document, regardless of field, if so, > then what you'd want is: > > (title:albino description:albino) (title:elephant description:elephant) > > with coord disabled on the *inner* queries, no? This way coord would > explicitly boost documents which matched on both terms. > > > FYI, MaxDisjunctionQuery has made an enormous improvement in the quality > > of my query results, and I have strong reason to believe the same would > > be true in most other domains (more on that coming in the idf^2 > > discussion). In terms of the albino elephant example, the query above > > was putting all the albino animals except elephants above the albino > > elephants, while the query with an outer BooleanQuery and inner > > MaxDisjunctionQuery's > > > > ( (title:albino^4 | description:albino)~0.1 > > (title:elephant^4 | description:elephant)~0.1 > > ) > > > > properly puts the albino elephants on top. > > If "albino" is outscoring "elephant" then you could either reduce the > impact of idf or increase the impact of coordination. Did you try, > e.g., defining coord as (overlap/max)^2 or somesuch? > > Or, perhaps take proximity into account, with "albino elephant"~10? Or > simply using AND instead of OR? These days most web search engines use > AND as the default operator and reward for proximity. Is that wrong for > your application? AND is effectively a coord of (overlap/max)^infinity. > > Doug > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]