[ 
https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849582#comment-13849582
 ] 

Ryan McKinley commented on LUCENE-5369:
---------------------------------------

bq. Maybe add a boolean option in the factory/filter? To remove code 
duplication?

Are you suggesting adding a flag to LowerCaseFilter?  I'm think that is more 
confusing than having a distinct UpperCaseFlter -- and the code duplication is 
essentially the minimum code required for a functioning Filter

bq. to me the analysis chain is not really the best tool to do the job of 
cleaning up faceting labels

I understand and often agree that other tools are more appropriate.  But there 
are lots of cases where the search analysis chain gets you so close to the 
desired display that duplicating things to a specific facet field seems 
redundant.

This is the analyzer I am working with:

{code}
<analyzer>
  <charFilter class="solr.MappingCharFilterFactory" 
mapping="normalize-my-field-chars.txt"/>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.TrimFilterFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="xxx.UpperCaseFilterFactory"/>
  <filter class="solr.SynonymFilterFactory" synonyms="path/to/synonyms.txt" 
ignoreCase="false" expand="false"/>
</analyzer>
{code}





> Add an UpperCaseFilter
> ----------------------
>
>                 Key: LUCENE-5369
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5369
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>            Assignee: Ryan McKinley
>            Priority: Minor
>         Attachments: LUCENE-5369-uppercase-filter.patch
>
>
> We should offer a standard way to force upper-case tokens.  I understand that 
> lowercase is safer for general search quality because some uppercase 
> characters can represent multiple lowercase ones.
> However, having upper-case tokens is often nice for faceting (consider 
> normalizing to standard acronyms)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to