Yeah, this is kind of tricky and confusing! Here's what happens:

1. The query parser "parses" the input string into individual source terms, each delimited by white space. The escape is removed in this process, but... no analyzer has been called at this stage.

2. The query parser (generator) calls the analyzer for each source term. Your analyzer is called at this stage, but... the escape is already gone, so... the <backslash><slash> mapping rule is not triggered, leaving the slash recorded in the source term from step 1.

You do need the backslash in your original query because a slash introduces a regex query term. It is added by the escape method you call, but the escaping will be gone by the time your analyzer is called.

So, just try a simple, unescaped slash in your char mapping table.

-- Jack Krupansky

-----Original Message----- From: Luis Pureza
Sent: Tuesday, June 17, 2014 1:43 PM
To: java-user@lucene.apache.org
Subject: Lucene QueryParser/Analyzer inconsistency

Hi,

I'm experience a puzzling behaviour with the QueryParser and was hoping
someone around here can help me.

I have a very simple Analyzer that tries to replace forward slashes (/) by
spaces. Because QueryParser forces me to escape strings with slashes before
parsing, I added a MappingCharFilter to the analyzer that replaces "\/"
with a single space. The analyzer is defined as follows:

@Override
protected TokenStreamComponents createComponents(String field, Reader in) {
   NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
   builder.add("\\/", " ");
   Reader mappingFilter = new MappingCharFilter(builder.build(), in);

   Tokenizer tokenizer = new WhitespaceTokenizer(version, mappingFilter);
   return new TokenStreamComponents(tokenizer);
}

Then I use this analyzer in the QueryParser to parse a string with dashes:

String text = QueryParser.escape("one/two");
QueryParser parser = new QueryParser(Version.LUCENE_48, "f", new
MyAnalyzer(Version.LUCENE_48));
System.err.println(parser.parse(text));

The expected output would be

f:one f:two

However, I get:

f:one/two

The puzzling thing is that when I debug the analyzer, it tokenizes the
input string correctly, returning two tokens instead of one.

What is going on?

Many thanks,

Luís Pureza

P.S.: I was able to fix this issue temporarily by creating my own tokenizer
that tokenizes on whitespace and slashes. However, I still don't understand
what's going on.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to