Classic QueryParser splits on whitespace and then sends the chunks to the analyzer one at a time. See <https://issues.apache.org/jira/browse/LUCENE-2605>.
-- Steve www.lucidworks.com > On Apr 28, 2016, at 5:54 AM, Bahaa Eldesouky <bahaab...@gmail.com> wrote: > > I am using org.apache.lucene.queryparser.classic.QueryParser in lucene > 6.0.0 to parse queries using a CustomAnalyzer as shown below: > > public static void testFilmAnalyzer() throws IOException, ParseException { > CustomAnalyzer nameAnalyzer = CustomAnalyzer.builder() > .addCharFilter("patternreplace", > "pattern", "(movie|film|picture).*", > "replacement", "") > .withTokenizer("standard") > .build(); > > QueryParser qp = new QueryParser("name", nameAnalyzer); > qp.setDefaultOperator(QueryParser.Operator.AND); > String[] strs = {"avatar film fiction", "avatar-film fiction", > "avatar-film-fiction"}; > > for (String str : strs) { > System.out.println("Analyzing \"" + str + "\":"); > showTokens(str, nameAnalyzer); > Query q = qp.parse(str); > System.out.println("Parsed query of \"" + str + "\":"); > System.out.println(q + "\n"); > }} > private static void showTokens(String text, Analyzer analyzer) throws > IOException { > StringReader reader = new StringReader(text); > TokenStream stream = analyzer.tokenStream("name", reader); > CharTermAttribute term = stream.addAttribute(CharTermAttribute.class); > stream.reset(); > while (stream.incrementToken()) { > System.out.print("[" + term.toString() + "]"); > } > stream.close(); > System.out.println();} > > > > > I get the following output, when I invoke testFilmAnalyzer(): > > Analyzing "avatar film fiction":[avatar]Parsed query of "avatar film > fiction":+name:avatar +name:fiction > Analyzing "avatar-film fiction":[avatar]Parsed query of "avatar-film > fiction":+name:avatar +name:fiction > Analyzing "avatar-film-fiction":[avatar]Parsed query of "avatar-film-fiction": > name:avatar > > > It seems like the analyzer uses the PatternReplaceCharFilter in its correct > intended order (i.e. before tokenization), while the QueryParser does so > afterwards. Does anyone have an explanation for that? Isn't that a bug? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org