[jira] [Commented] (LUCENE-5620) LowerCaseFilter.preserveOriginal

Mike Sokolov (JIRA) Sat, 19 Apr 2014 11:51:26 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974932#comment-13974932
 ]


Mike Sokolov commented on LUCENE-5620:
--------------------------------------

bq. doing this selectively (only adding additional terms in some cases) is 
pretty complicated if you dont want to screw over length normalization

Interesting point, although it's debatable how strong the effect is - I guess 
it depends on how many tokens are affected by the filter chain, and whether 
this varies in any significant way from document to document: I tend to think 
that the number of capitalized words, say, will be similar from document to 
document, but of course there will be exceptions in different data sets. 

It makes me wonder whether length normalization shouldn't use max position 
instead of term count when it is available.

> LowerCaseFilter.preserveOriginal
> --------------------------------
>
>                 Key: LUCENE-5620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5620
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Mike Sokolov
>         Attachments: LUCENE-5620.patch
>
>
> Following closely the model of LUCENE-5437 (which worked on 
> ASCIIFoldingFilter), this patch adds the ability to preserve the original 
> token to LowerCaseFilter.  This is useful if you want an all-lowercase search 
> term to match without regard to case, while search terms with uppercase 
> letters match in a case-sensitive manner. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5620) LowerCaseFilter.preserveOriginal

Reply via email to