How to ignore stop word gaps in queries? Lucene 4.4+

Chris Tomlinson Thu, 10 Apr 2014 07:27:10 -0700

Hello,

We're using the Lucene 4.4 embedded in eXist-db (exist-db.org), and as the 
subject indicates we want to ignore stop word gaps in queries - without the 
user having to indicate where such gaps might occur at query time.


Since Lucene 4.4 the FilteringTokenFilter.setEnablePositionIncrements(false) is 
not available.

Prior to Lucene 4.4 it was possible setEnablePositionIncrements(false) so that 
during indexing and querying the number and position of stop word gaps would be 
ignored.

This meant that a phrase such as:

    blue is the sky

with stop words "is" and "the" would be selected by the query:

    blue sky

We are working with Tibetan and elisions are not uncommon so that, e.g.:

    rin po che

on some occasions might be shortened to

    rin che

and we would like to have a query of

    rin po che

or

    rin che

find all occurrences of

    rin po che

and

    rin che

without having the user have to mark where elisions might occur.

The 
org.apache.lucene.queryparser.flexible.standard.CommonQueryParserConfiguration 
provides a setEnablePositionIncrements but that does not seem to work to allow 
for the above desired query behavior that was possible prior to Lucene 4.4.

What is the proper way to ignore stop word gaps?

Thank you,
Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

How to ignore stop word gaps in queries? Lucene 4.4+

Reply via email to