[
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451265#comment-16451265
]
Robert Muir commented on LUCENE-8273:
-------------------------------------
{quote}
I added this to core rather than to the analysis module as it seems to me to be
a utility class like FilteringTokenFilter, which is also in core. But I'm
perfectly happy to move it to analysis-common if that makes more sense to
others.
{quote}
The idea is cool but I would like to see it more fleshed out (eg. marked
experimental somewhere) before going into core/:
* improved testing: i'd like to see some edge cases tested such as both "true"
and "false" cases on the final token for end(), etc. what happens is a little
sneaky, think it should be hooked into TestRandomChains (this should probably
be explicitly added to that test, wrapping with check of random.nextBoolean()
or something simple that will test all cases). This may uncover some
integration difficulties. In particular, it is not clear to me how some stuff
such as end() works correctly in the general case with this filter right now.
* integration with CustomAnalyzer: as this would add a generic "if" to allow
branching in analysis chains (there is an issue somewhere for this), which
would be very powerful, it would be good to plumb into CustomAnalyzer to make
sure it can work well with the factory model. seems doable with the functional
interface but needs to be proven out.
> Add a BypassingTokenFilter
> --------------------------
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265. It would be useful to be able to wrap a TokenFilter
> in such a way that it could optionally be bypassed based on the current state
> of the TokenStream. This could be used to, for example, only apply
> WordDelimiterFilter to terms that contain hyphens.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]