[
https://issues.apache.org/jira/browse/LUCENE-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174046#comment-13174046
]
Santiago M. Mola commented on LUCENE-3663:
------------------------------------------
@Uwe: Thanks for the comments.
@Robert: Then this filter would mark phone tokens as <PHONE> type and I could
filter non-<PHONE> tokens with a subsequent filter? In my specific use case, I
need to throw away any token that could not be normalized, so I have to, at
least, mark phone tokens for removal in further steps. If tokens are not
marked, then we would have to check twice if the token is a valid phone.
> Add a phone number normalization TokenFilter
> --------------------------------------------
>
> Key: LUCENE-3663
> URL: https://issues.apache.org/jira/browse/LUCENE-3663
> Project: Lucene - Java
> Issue Type: New Feature
> Components: modules/analysis
> Reporter: Santiago M. Mola
> Priority: Minor
> Attachments: PhoneFilter.java
>
>
> Phone numbers can be found in the wild in an infinity variety of formats
> (e.g. with spaces, parenthesis, dashes, with or without country code, with
> letters in substitution of numbers). So some Lucene applications can benefit
> of phone normalization with a TokenFilter that gets a phone number in any
> format, and outputs it in a standard format, using a default country to guess
> country code if it's not present.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]