[
https://issues.apache.org/jira/browse/LUCENE-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174024#comment-13174024
]
Uwe Schindler commented on LUCENE-3663:
---------------------------------------
One more thing, as you want to filter out tokens, you should not subclass
TokenFilter directly but instead sublass
org.apache.lucene.analysis.util.FilteringTokenFilter and do the work in the
match() method. You are free to modify the token there, too. This new base
class would correctly handle position increments, as noted as TODO in your
comments.
> Add a phone number normalization TokenFilter
> --------------------------------------------
>
> Key: LUCENE-3663
> URL: https://issues.apache.org/jira/browse/LUCENE-3663
> Project: Lucene - Java
> Issue Type: New Feature
> Components: modules/analysis
> Reporter: Santiago M. Mola
> Priority: Minor
> Attachments: PhoneFilter.java
>
>
> Phone numbers can be found in the wild in an infinity variety of formats
> (e.g. with spaces, parenthesis, dashes, with or without country code, with
> letters in substitution of numbers). So some Lucene applications can benefit
> of phone normalization with a TokenFilter that gets a phone number in any
> format, and outputs it in a standard format, using a default country to guess
> country code if it's not present.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]