Hello Uwe, Thank you for the reply. I see that there is a version check for the use of setEnablePositionIncrements(false); and, I think I may be able to use an earlier api with the eXist-db embedding of Lucene 4.4 to avoid the version check.
However, my question was intended to improve my understanding of how to properly use stop words and/or how to properly achieve the use case that I outlined. My naive understanding of the purpose of stop words is: to remove from indexing words that are not helpful in discriminating or selecting documents since they occur so frequently. The use case that I intended to illustrate is meant to ignore the occurrence or non-occurrence of stop words in a query w.r.t. selection of documents during search. As I understand the situation currently, occurrences of stop words in a query phrase are replaced by "?"s to indicate the presence of an otherwise unspecified word in the query. So the phrase: blue is the moon with "is" and "the" as stop words, would be indexed effectively as: blue ? ? moon and the query phrase: blue was a moon would be treated as: blue ? ? moon and would retrieve a document containing: blue is the moon But in the use case that I presented we really want the query: blue moon to also select the document without the user having to indicate the possible presence of stop words or not. So my question is: How is one supposed to achieve this use case in Lucene 4.4+? Thank you, Chris On Apr 24, 2014, at 5:52 AM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > You can still change the setting on the TokenFilter after creating it: > StopFilter#setEnablePositionIncrements(false) - this method was *not* removed! > This fails only is you pass matchVersion>=Version.LUCENE_44. Just use an > older matchVersion parameter to the constructor and you can still enable this > broken behavior (for backwards compatibility). > > This is no longer officially supported, but can be a workaround. To me it > looks like you misunderstood stopwords. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -----Original Message----- >> From: Tincu Gabriel [mailto:tincu.gabr...@gmail.com] >> Sent: Thursday, April 24, 2014 12:27 PM >> To: java-user@lucene.apache.org >> Subject: Re: What is the proper use of stop words in Lucene? >> >> Hi there, >> The StopFilterFactory can be used to produce StopFilters with the desired >> stop-words inside of it . As a constructor argument it takes a >> Map<String,String> and one of the valid keys you can pass inside of that is >> "enablePositionIncrements" . If you don't pass that in then it defaults to >> true. >> Is this what you were looking for? >> >> >> On Wed, Apr 23, 2014 at 12:36 PM, Chris Tomlinson < >> chris.j.tomlin...@gmail.com> wrote: >> >>> Hello, >>> >>> I've written several times now on the list with this question / >>> problem and no one has yet replied so I don't know if the question is >>> too wrong-headed or if there is simply no one reading the list that >>> can comment on the question. >>> >>> The question that I'm trying to get answered is what is the correct >>> way of ignoring stop word gaps in Lucene 4.4+? >>> >>> While we are using Lucene 4.4 embedded in eXist-db (exist-db.org), I >>> think the question is a proper Lucene question and really has nothing >>> to do with the fact that we're using it in an embedded manner. >>> >>> The problem to be solved is how to ignore stop word gaps in queries - >>> without the user having to indicate where such gaps might occur at >>> query time. >>> >>> Since Lucene 4.4 the >>> FilteringTokenFilter.setEnablePositionIncrements(false) is not available. >>> None of the resources such as the "Lucene in Action" and so on explain >>> how to use Lucene to get the desired effect now that 4.4+ has removed >>> the previous approach. >>> >>> Prior to Lucene 4.4 it was possible to >>> setEnablePositionIncrements(false) >>> so that during indexing and querying the number and position of stop >>> word gaps would be ignored (as mentioned on pp 138-139 of "Lucene in >> Action"). >>> >>> This meant that a document with a phrase such as: >>> >>> blue is the sky >>> >>> with stop words "is" and "the" would be selected by the query: >>> >>> blue sky >>> >>> This is what we want to achieve. >>> >>> Why? We are working with Tibetan and elisions are not uncommon so >>> that, >>> e.g.: >>> >>> rin po che >>> >>> on some occasions might be shortened to >>> >>> rin che >>> >>> and we would like to have a query of >>> >>> rin po che >>> >>> or >>> >>> rin che >>> >>> find all occurrences of >>> >>> rin po che >>> >>> and >>> >>> rin che >>> >>> without having the user have to mark where elisions might occur. >>> >>> The >>> >> org.apache.lucene.queryparser.flexible.standard.CommonQueryParserConfi >>> guration provides a setEnablePositionIncrements but that does not seem >>> to work to allow for the above desired query behavior that was >>> possible prior to Lucene 4.4. >>> >>> What is the proper way to ignore stop word gaps? >>> >>> Thank you, >>> Chris >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org