[ https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451271#comment-16451271 ]
Robert Muir commented on LUCENE-8273: ------------------------------------- Also I am not sure if the name BypassingTokenFilter is the best. It works well for your case (but I think "bypass" may be due to some inertia/history and maybe not the best going forward). Maybe it should be "if" instead of "unless". {code} // don't lowercase if the term contains an "o" character TokenStream t = new BypassingTokenFilter(cts, AssertingLowerCaseFilter::new) { CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); @Override protected boolean bypass() throws IOException { return termAtt.toString().contains("o"); } }; {code} But will look awkward for other cases: {code} // apply greek stemmer ("don't bypass") if the token is written in the greek script. TokenStream t = new BypassingTokenFilter(ts, GreekStemmer::new) { ScriptAttribute scriptAtt = addAttribute(ScriptAttribute.class); @Override protected boolean bypass() throws IOException { return scriptAtt.getCode() != UScript.GREEK; } }; {code} > Add a BypassingTokenFilter > -------------------------- > > Key: LUCENE-8273 > URL: https://issues.apache.org/jira/browse/LUCENE-8273 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Alan Woodward > Priority: Major > Attachments: LUCENE-8273.patch > > > Spinoff of LUCENE-8265. It would be useful to be able to wrap a TokenFilter > in such a way that it could optionally be bypassed based on the current state > of the TokenStream. This could be used to, for example, only apply > WordDelimiterFilter to terms that contain hyphens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org