Do stopfilters create non-contiguous token positions?
 
I was interested in experimenting with the highlighter and using the
TokenSources.getTokenStream(TermPositionVector
<file:///C:\mysvn\lucene\build\docs\api\org\apache\lucene\index\TermPosition
Vector.html>  tpv,                                       boolean
tokenPositionsGuaranteedContiguous) method
 
The javadocs for this method note that:

tokenPositionsGuaranteedContiguous - true if the token position numbers have
no overlaps or gaps.

 

The example used for comparison to re-Analyzing the the text includes
stopwords ("timings above were using a stemmer/lowercaser/stopword combo").

I was curious if a stopwords, by definition meant that tokens were not
contiguous? Is this still true if the the query uses the same analyzer and
filters out the same stopwords?

 

Thanks,

Dan

Reply via email to