[ https://issues.apache.org/jira/browse/LUCENE-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696683#comment-14696683 ]
Adrien Grand commented on LUCENE-6737: -------------------------------------- +1 > Add DecimalDigitFilter > ---------------------- > > Key: LUCENE-6737 > URL: https://issues.apache.org/jira/browse/LUCENE-6737 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Robert Muir > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6737.patch > > > TokenFilter that folds all unicode digits > (http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:General_Category=Decimal_Number:]) > to 0-9. > Historically a lot of the impacted analyzers couldn't even tokenize numbers > at all, but now they use standardtokenizer for numbers/alphanum tokens. But > its usually the case you will find e.g. a mix of both ascii digits and > "native" digits, and today that makes searching difficult. > Note this only impacts *decimal* digits, hence the name DecimalDigitFilter. > So no processing of chinese numerals or anything crazy like that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org