boost:(+petroleum +engineer +refinery)
(+contents:(+petroleum +engineer +refinery)
+((*:* -boost:petroleum)
(*:* -boost:engineer)
(*:* -boost:refinery)))
That's an interesting solution. Would this result in many more documents
being visited by the scorer, possibly impacting performance? (I haven't
tried it yet).
Thanks,
Peter
On Thu, Nov 6, 2008 at 6:56 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote:
> Hi Peter,
>
> On 11/06/2008 at 4:25 PM, Peter Keegan wrote:
> > I've discovered another flaw in using this technique:
> >
> > (+contents:petroleum +contents:engineer +contents:refinery)
> > (+boost:petroleum +boost:engineer +boost:refinery)
> >
> > It's possible that the first clause will produce a matching
> > doc and none of the terms in the second clause are used to
> > score that doc. Yet another reason to use BoostingTermQuery.
>
> I think you could address this, without BTQ, using something like:
>
> boost:(+petroleum +engineer +refinery)
> (+contents:(+petroleum +engineer +refinery)
> +((*:* -boost:petroleum)
> (*:* -boost:engineer)
> (*:* -boost:refinery)))
>
> The last three lines gives you the set of documents that are missing at
> least one of the terms in the "boost" field. The *:* thingy, indicating a
> MatchAllDocsQuery, is necessary to get all documents that don't have a given
> term; Lucene's (sub-)query document exclusion operation needs a non-empty
> set on which to operate.
>
> On 11/06/2008 at 1:08 PM, Peter Keegan wrote:
> > Then, at search time, a query for "petroleum engineer" gets rewritten
> > to: (+contents:petroleum +contents:engineer) (+boost:petroleum
> > +boost:engineer). Note that the two clauses are OR'd so that a term that
> > exists in both fields will get a higher weight in the 'boost' field.
> > This works quite well at boosting documents with terms that exist in the
> > boosted fields. However, it doesn't work properly if excluded terms are
> > added, for example:
> >
> > (+contents:petroleum +contents:engineer -contents:drilling)
> > (+boost:petroleum +boost:engineer -boost:drilling)
> >
> > If a document contains the term 'drilling' in the 'body'
> > field, but not in the 'title' or 'city' field, a false hit occurs.
>
> I think you could address this problem like this:
>
> +(boost:(+petroleum +engineer)
> (+contents:(+petroleum +engineer)
> +((*:* -boost:petroleum)
> (*:* -boost:engineer))))
> -contents:drilling
>
> You don't have to include "-boost:drilling", because this condition is
> entailed by "-contents:drilling".
>
> Steve
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>