[ https://issues.apache.org/jira/browse/LUCENE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385302#comment-16385302 ]
Uwe Schindler commented on LUCENE-8186: --------------------------------------- Thanks Robert. Looks ok, although horrible. How about CharFilters? Do they have the same problem? > CustomAnalyzer with a LowerCaseTokenizerFactory fails to normalize multiterms > ------------------------------------------------------------------------------ > > Key: LUCENE-8186 > URL: https://issues.apache.org/jira/browse/LUCENE-8186 > Project: Lucene - Core > Issue Type: Bug > Reporter: Tim Allison > Priority: Minor > Attachments: LUCENE-8186.patch > > > While working on SOLR-12034, a unit test that relied on the > LowerCaseTokenizerFactory failed. > After some digging, I was able to replicate this at the Lucene level. > Unit test: > {noformat} > @Test > public void testLCTokenizerFactoryNormalize() throws Exception { > Analyzer analyzer = > CustomAnalyzer.builder().withTokenizer(LowerCaseTokenizerFactory.class).build(); > //fails > assertEquals(new BytesRef("hello"), analyzer.normalize("f", "Hello")); > > //now try an integration test with the classic query parser > QueryParser p = new QueryParser("f", analyzer); > Query q = p.parse("Hello"); > //passes > assertEquals(new TermQuery(new Term("f", "hello")), q); > q = p.parse("Hello*"); > //fails > assertEquals(new PrefixQuery(new Term("f", "hello")), q); > q = p.parse("Hel*o"); > //fails > assertEquals(new WildcardQuery(new Term("f", "hel*o")), q); > } > {noformat} > The problem is that the CustomAnalyzer iterates through the tokenfilters, but > does not call the tokenizer, which, in the case of the LowerCaseTokenizer, > does the filtering work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org