Hi Joel,

I encounter the same problem.
Could you please elaborate a bit on this?

Many thanks,
Liat

2009/11/2 Joel Halbert <[email protected]>

> I opted to use the following query to solve this problem, since it meets
> my requirements, for the time being.
>
> +(cheese sandwich) "cheese sandwich"~slop
>
> This includes documents with one of more of the terms, but prefers those
> with an edit distance <= the slop.
>
>
> -----Original Message-----
> From: Joel Halbert <[email protected]>
> Reply-To: [email protected]
> To: [email protected]
> Subject: Re: scoring adjacent terms without proximity search
>  Date: Sat, 31 Oct 2009 08:38:29 +0000
>
> Thank you all for your suggestions, I shall have a little think about
> the best way forward, and report back if I do anything interesting that
> works well.
>
> In answer to Grant's question, why not use PhraseQuery,  we do not want
> to have an artificial upper limit on the slop, i.e. we do want to
> include documents that might only have a subset of words from the
> phrase. (e.g. just cheese, or just sandwich, but not both).
>
>
> -----Original Message-----
> From: Robert Muir <[email protected]>
> Reply-To: [email protected]
> To: [email protected]
> Subject: Re: scoring adjacent terms without proximity search
> Date: Fri, 30 Oct 2009 16:04:03 -0400
>
> > I suppose you could precompute the proximity associations by indexing
> > n-grams (in this case, called Lucene calls them shingles), such that
> there
> > is a single token in your index containing cheese_sandwich (effectively)
> >
> >
> doh, I see Grant already lead you in this direction. (sorry for the
> duplicate mail)
> on average its worked for me for some things like this.
>
> although, I'll try to contribute something actually useful, and mention
> that
> if you use things like shingles, its good to consider modifying
> DefaultSimilarity, look at setDiscountOverlaps param.
> otherwise, i've measured cases where injecting additional tokens will cause
> more harm than good, because it has an adverse affect on lengthnorm.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to