[ 
https://issues.apache.org/jira/browse/LUCENE-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174046#comment-13174046
 ] 

Santiago M. Mola commented on LUCENE-3663:
------------------------------------------

@Uwe: Thanks for the comments.

@Robert: Then this filter would mark phone tokens as <PHONE> type and I could 
filter non-<PHONE> tokens with a subsequent filter? In my specific use case, I 
need to throw away any token that could not be normalized, so I have to, at 
least, mark phone tokens for removal in further steps. If tokens are not 
marked, then we would have to check twice if the token is a valid phone.
                
> Add a phone number normalization TokenFilter
> --------------------------------------------
>
>                 Key: LUCENE-3663
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3663
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Santiago M. Mola
>            Priority: Minor
>         Attachments: PhoneFilter.java
>
>
> Phone numbers can be found in the wild in an infinity variety of formats 
> (e.g. with spaces, parenthesis, dashes, with or without country code, with 
> letters in substitution of numbers). So some Lucene applications can benefit 
> of phone normalization with a TokenFilter that gets a phone number in any 
> format, and outputs it in a standard format, using a default country to guess 
> country code if it's not present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to