[ https://issues.apache.org/jira/browse/SOLR-11462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated SOLR-11462: ------------------------------- Affects Version/s: master (8.0) > TokenizerChain's normalize() doesn't work > ----------------------------------------- > > Key: SOLR-11462 > URL: https://issues.apache.org/jira/browse/SOLR-11462 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: master (8.0) > Reporter: Tim Allison > Priority: Trivial > > TokenizerChain's {{normalize()}} is not currently used so this doesn't > currently have any negative effects on search. However, there is a bug, and > we should fix it. > If applied to a TokenizerChain with {{filters.length > 1}}, only the last > would apply. > > {noformat} > @Override > protected TokenStream normalize(String fieldName, TokenStream in) { > TokenStream result = in; > for (TokenFilterFactory filter : filters) { > if (filter instanceof MultiTermAwareComponent) { > filter = (TokenFilterFactory) ((MultiTermAwareComponent) > filter).getMultiTermComponent(); > result = filter.create(in); > } > } > return result; > } > {noformat} > The fix is trivial: > {noformat} > - result = filter.create(in); > + result = filter.create(result); > {noformat} > If you'd like to swap out {{TextField#analyzeMultiTerm()}} with, say: > {noformat} > public static BytesRef analyzeMultiTerm(String field, String part, Analyzer > analyzerIn) { > if (part == null || analyzerIn == null) return null; > return analyzerIn.normalize(field, part); > } > {noformat} > I'm happy to submit a PR with unit tests. Let me know. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org