[ https://issues.apache.org/jira/browse/LUCENE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974932#comment-13974932 ]
Mike Sokolov commented on LUCENE-5620: -------------------------------------- bq. doing this selectively (only adding additional terms in some cases) is pretty complicated if you dont want to screw over length normalization Interesting point, although it's debatable how strong the effect is - I guess it depends on how many tokens are affected by the filter chain, and whether this varies in any significant way from document to document: I tend to think that the number of capitalized words, say, will be similar from document to document, but of course there will be exceptions in different data sets. It makes me wonder whether length normalization shouldn't use max position instead of term count when it is available. > LowerCaseFilter.preserveOriginal > -------------------------------- > > Key: LUCENE-5620 > URL: https://issues.apache.org/jira/browse/LUCENE-5620 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Mike Sokolov > Attachments: LUCENE-5620.patch > > > Following closely the model of LUCENE-5437 (which worked on > ASCIIFoldingFilter), this patch adds the ability to preserve the original > token to LowerCaseFilter. This is useful if you want an all-lowercase search > term to match without regard to case, while search terms with uppercase > letters match in a case-sensitive manner. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org