[jira] [Commented] (LUCENE-7342) WordDelimiterFilter should observe KeywordAttribute to pass these tokens through

David Smiley (JIRA) Thu, 16 Jun 2016 12:51:23 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334544#comment-15334544
 ]


David Smiley commented on LUCENE-7342:
--------------------------------------

A separate issue might be to refactor the APIs of TokenFilters that take a 
CharArraySet input to instead take a 
{{java.util.function.Predicate<CharSequence>}}.  Advanced users could even 
construct a Predicate instance with access to the AttributeSource to look at 
whatever attributes it wants, provided that the TokenFilters only invoke it 
when the token stream is positioned to the token in question.

> WordDelimiterFilter should observe KeywordAttribute to pass these tokens 
> through
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-7342
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7342
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: David Smiley
>
> I have a text analysis requirement in which I want certain tokens to not be 
> processed by WordDelimiterFilter -- i.e. they should pass through that 
> filter.  WDF, like several other TokenFilters, has a configurable word list 
> but this list is static producing a concrete CharArraySet.  Thus, for 
> example, I can't filter by a regexp nor can I filter based on other 
> attributes.
> A simple solution that makes sense to me is to have WDF use KeywordAttribute 
> to know if it should skip the token.  KeywordAttribute seems fairly generic 
> as to how it can be used, although granted today it's only used by the 
> stemmers.  That attribute isn't named "StemmerIgnoreAttribute" or some-such; 
> it's generic so I think it's fine for WDF to use it in a similar way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7342) WordDelimiterFilter should observe KeywordAttribute to pass these tokens through

Reply via email to