[jira] [Commented] (LUCENE-8273) Add a BypassingTokenFilter

Robert Muir (JIRA) Tue, 24 Apr 2018 14:47:35 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451265#comment-16451265
 ]


Robert Muir commented on LUCENE-8273:
-------------------------------------

{quote}
I added this to core rather than to the analysis module as it seems to me to be 
a utility class like FilteringTokenFilter, which is also in core. But I'm 
perfectly happy to move it to analysis-common if that makes more sense to 
others.
{quote}

The idea is cool but I would like to see it more fleshed out (eg. marked 
experimental somewhere) before going into core/:
* improved testing:  i'd like to see some edge cases tested such as both "true" 
and "false" cases on the final token for end(), etc. what happens is a little 
sneaky,  think it should be hooked into TestRandomChains (this should probably 
be explicitly added to that test, wrapping with check of random.nextBoolean() 
or something simple that will test all cases). This may uncover some 
integration difficulties. In particular, it is not clear to me how some stuff 
such as end() works correctly in the general case with this filter right now.
* integration with CustomAnalyzer: as this would add a generic "if" to allow 
branching in analysis chains (there is an issue somewhere for this), which 
would be very powerful, it would be good to plumb into CustomAnalyzer to make 
sure it can work well with the factory model. seems doable with the functional 
interface but needs to be proven out.


> Add a BypassingTokenFilter
> --------------------------
>
>                 Key: LUCENE-8273
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8273
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8273) Add a BypassingTokenFilter

Reply via email to