[ 
https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849374#comment-13849374
 ] 

Robert Muir commented on LUCENE-5369:
-------------------------------------

My only thoughts are the usual ones: to me the analysis chain is not really the 
best tool to do the job of cleaning up faceting labels?

These tasks typically dont require tokenization and work on whole values, and 
may require stuff like extracting values from one field into another. While its 
true you can do some of this cleanup (casing/trimming,etc) in the analysis 
chain by (ab)using the fact that fieldcache uninverts indexed values and using 
keywordtokenizer and using filters like this, its not very intuitive, and you 
can't do all of it, whereas using something like solr's updateprocessor chain 
might be a better place to have this support. There is already overlap, e.g. it 
can trim field contents as well.

> Add an UpperCaseFilter
> ----------------------
>
>                 Key: LUCENE-5369
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5369
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>            Assignee: Ryan McKinley
>            Priority: Minor
>         Attachments: LUCENE-5369-uppercase-filter.patch
>
>
> We should offer a standard way to force upper-case tokens.  I understand that 
> lowercase is safer for general search quality because some uppercase 
> characters can represent multiple lowercase ones.
> However, having upper-case tokens is often nice for faceting (consider 
> normalizing to standard acronyms)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to