Of course - you need to use the same analyzer for both indexing and query. So, just reindex your data with this new analyzer.

-- Jack Krupansky

-----Original Message----- From: Natalia Connolly
Sent: Tuesday, March 18, 2014 10:37 AM
To: java-user@lucene.apache.org
Subject: Re: How to search for terms containing negation

I am afraid this did not work, Tri.  Here's what I tried:

List<String> words = new ArrayList();
Boolean ignoreCase = true;
CharArraySet emptyset = new
CharArraySet(Version.LUCENE_47,words,ignoreCase);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47,emptyset);

Here's what happens:

Searching for: no
0 total matching documents
Searching for: not
0 total matching documents

even though I know the documents contain plenty of "no" and "not"s.

Could the problem be more upstream (i.e., words like this aren't even
indexed?)

Thank you,

Natalia




On Mon, Mar 17, 2014 at 3:57 PM, Tri Cao <tm...@me.com> wrote:

StandardAnalyzer has a constructor that takes a stop word set, so I guess
you can pass it an empty set:

http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html#StandardAnalyzer(org.apache.lucene.util.Version,
org.apache.lucene.analysis.util.CharArraySet)

QueryParser is probably ok. I rarely use this parser but I don't think it
recognizes "not" in its grammar.

Hope this helps,
Tri


On Mar 17, 2014, at 12:46 PM, Natalia Connolly <
natalia.v.conno...@gmail.com> wrote:

Hi Tri,

Thank you so much for your message!

Yes, it looks like the negation terms have indeed been filtered out;
when I query on "no" or "not", I get no results. I am just using
StandardAnalyzer and the classic QueryParser:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
QueryParser parser = new QueryParser(Version.LUCENE_47, field, analyzer);

Which analyzer/parser would you recommend?

Thank you again,

Natalia







On Mon, Mar 17, 2014 at 3:35 PM, Tri Cao <tm...@me.com> wrote:

Natalia,

First make sure that your analyzers (both index and query analyzers) do

not filter out these as stop words. I think the standard StopFilter list

has "no" and "not". You can try to see if you index have these terms by

querying for "no" as a TermQuery. If there is not match for that query,

then you know for sure they have been filtered out.

The next thing is to check is your query parser. What query parser are you

using? Some parser actually understands the "not" term and rewrite to a

negation query.

Hope this helps,

Tri

On Mar 17, 2014, at 12:02 PM, Natalia Connolly <

natalia.v.conno...@gmail.com> wrote:

Hi All,

Is there any way I could construct a query that would not automatically

exclude negation terms (such as "no", "not", etc)? For example, I need to

find strings like "not happy", "no idea", "never available". I tried

using a simple analyzer with combinations such as "not AND happy", and

similar patterns, but it does not work.

Any help would be appreciated!

Natalia




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to