[ https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472873#comment-16472873 ]
Steve Rowe commented on LUCENE-8273: ------------------------------------ I've belatedly reviewed the changes committed under this issue - awesome additions! I noticed a few small things that could be improved, and in putting together a patch I found what I think is a bug/omission in the design: {{TermExclusionFilterFactory}} can't be used outside of {{CustomAnalyzer}} because {{ConditionalTokenFilterFactory.inform()}} expects {{setInnerFilters()}} to have already been called, but {{TermExclusionFilterFactory}} doesn't have the ability to do that. As a result, the protected word list can't be loaded. AFAICT this will prevent {{TermExclusionFilterFactory}} from being used in Solr, e.g. My idea to address the problem is to allow {{TermExclusionFilterFactory}} to accept (but not require) as args wrapped filters and those filters' args, and then call {{setInnerFilters()}} from the ctor. I've added a factory test suite that appears to validate the idea. See javadocs in the patch for interface details. Because of the way I produced it, the patch (forthcoming) includes several other changes, some of which may be controversial. I'd be happy to factor out any of these changes if you don't agree with all of them, [~romseygeek]. The other changes, in no particular order: * I think {{TermExclusionFilter}} should be renamed to {{ProtectedTermFilter}} (or similar) - forcing devs&users to deal with *both* "protected" and "excluded" seems pointless to me; it should be either one or the other everywhere. * {{ProtectedTermFilter}} ctor should {{require()}} the protected terms file param rather than {{get()}} it, since it is in fact required. * Added {{TestConditionalTokenFilter.testMultipleConditionalFilters()}} * The boolean sense of {{ConditionalTokenFilter.shouldFilter()}} is backward in the class javadocs: it says that shouldFilter:false will execute the wrapped filters, but AFAICT it's the opposite. * Some other minor javadoc improvements. If the approach is okay, there is a little bit more work to do: some additional testing (more tests for graceful handling of bad input, and a Solr test), and adding Solr ref guide doc. I haven't run precommit or all tests yet, so there may be some outstanding problems. > Add a ConditionalTokenFilter > ---------------------------- > > Key: LUCENE-8273 > URL: https://issues.apache.org/jira/browse/LUCENE-8273 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, > LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, > LUCENE-8273.patch > > > Spinoff of LUCENE-8265. It would be useful to be able to wrap a TokenFilter > in such a way that it could optionally be bypassed based on the current state > of the TokenStream. This could be used to, for example, only apply > WordDelimiterFilter to terms that contain hyphens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org