RE: What is the proper use of stop words in Lucene?

Uwe Schindler Thu, 24 Apr 2014 03:54:08 -0700

Hi,

You can still change the setting on the TokenFilter after creating it: 
StopFilter#setEnablePositionIncrements(false) - this method was *not* removed!
This fails only is you pass matchVersion>=Version.LUCENE_44. Just use an older 
matchVersion parameter to the constructor and you can still enable this broken 
behavior (for backwards compatibility).


This is no longer officially supported, but can be a workaround. To me it looks 
like you misunderstood stopwords.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Tincu Gabriel [mailto:tincu.gabr...@gmail.com]
> Sent: Thursday, April 24, 2014 12:27 PM
> To: java-user@lucene.apache.org
> Subject: Re: What is the proper use of stop words in Lucene?
> 
> Hi there,
> The StopFilterFactory can be used to produce StopFilters with the desired
> stop-words inside of it . As a constructor argument it takes a
> Map<String,String> and one of the valid keys you can pass inside of that is
> "enablePositionIncrements" . If you don't pass that in then it defaults to 
> true.
> Is this what you were looking for?
> 
> 
> On Wed, Apr 23, 2014 at 12:36 PM, Chris Tomlinson <
> chris.j.tomlin...@gmail.com> wrote:
> 
> > Hello,
> >
> > I've written several times now on the list with this question /
> > problem and no one has yet replied so I don't know if the question is
> > too wrong-headed or if there is simply no one reading the list that
> > can comment on the question.
> >
> > The question that I'm trying to get answered is what is the correct
> > way of ignoring stop word gaps in Lucene 4.4+?
> >
> > While we are using Lucene 4.4 embedded in eXist-db (exist-db.org), I
> > think the question is a proper Lucene question and really has nothing
> > to do with the fact that we're using it in an embedded manner.
> >
> > The problem to be solved is how to ignore stop word gaps in queries -
> > without the user having to indicate where such gaps might occur at
> > query time.
> >
> > Since Lucene 4.4 the
> > FilteringTokenFilter.setEnablePositionIncrements(false) is not available.
> > None of the resources such as the "Lucene in Action" and so on explain
> > how to use Lucene to get the desired effect now that 4.4+ has removed
> > the previous approach.
> >
> > Prior to Lucene 4.4 it was possible to
> > setEnablePositionIncrements(false)
> > so that during indexing and querying the number and position of stop
> > word gaps would be ignored (as mentioned on pp 138-139 of "Lucene in
> Action").
> >
> > This meant that a document with a phrase such as:
> >
> >    blue is the sky
> >
> > with stop words "is" and "the" would be selected by the query:
> >
> >    blue sky
> >
> > This is what we want to achieve.
> >
> > Why? We are working with Tibetan and elisions are not uncommon so
> > that,
> > e.g.:
> >
> >    rin po che
> >
> > on some occasions might be shortened to
> >
> >    rin che
> >
> > and we would like to have a query of
> >
> >    rin po che
> >
> > or
> >
> >    rin che
> >
> > find all occurrences of
> >
> >    rin po che
> >
> > and
> >
> >    rin che
> >
> > without having the user have to mark where elisions might occur.
> >
> > The
> >
> org.apache.lucene.queryparser.flexible.standard.CommonQueryParserConfi
> > guration provides a setEnablePositionIncrements but that does not seem
> > to work to allow for the above desired query behavior that was
> > possible prior to Lucene 4.4.
> >
> > What is the proper way to ignore stop word gaps?
> >
> > Thank you,
> > Chris
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: What is the proper use of stop words in Lucene?

Reply via email to