On Dec 5, 2005, at 11:32 PM, Dan Climan wrote:
Do stopfilters create non-contiguous token positions?

No, not currently. StopFilter leaves token positions in their original state, which defaults to contiguous (offset of 1).

There is an open issue to change this behavior though, and at one point I changed it temporarily but it caused issues with PhraseQuery and QueryParser. PhraseQuery now supports term positions, and QueryParser also supports setting the PhraseQuery term positions appropriately. So perhaps it is time to change StopFilter, or perhaps make it an optional feature.

I like the idea of leaving holes in the token positions so there is a more accurate picture of the original text so that phrase queries can avoid matching across where stop words were removed unless some slop is specified.

The javadocs for this method note that:

tokenPositionsGuaranteedContiguous - true if the token position numbers have
no overlaps or gaps.

You will want this to be set true.

I was curious if a stopwords, by definition meant that tokens were not
contiguous? Is this still true if the the query uses the same analyzer and
filters out the same stopwords?

Currently tokens are contiguous by all built-in analyzers, regardless of any tokens that may have been removed.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to