[ 
https://issues.apache.org/jira/browse/PIG-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174502#comment-13174502
 ] 

Prashant Kommireddi commented on PIG-2443:
------------------------------------------

1. IsNumeric does not check for Long/Double range at all. Its simply a check to 
verify whether a String contains ONLY digits or not. The reason to implement 
this is to give users the ability to make a check for numeric"ness", and not 
necessarily to cast it back to a data type.

Example: At my previous company we stored item listings as a Numeric value. 
These Item Listing IDs could go well beyond the range of Long/Double. If I try 
to check for numeric"ness" based on a certain data type (long, double) it would 
fail. 
The reason I implemented this is currently I use it to only SPLIT based on 
numeric"ness" in the log files. Once I have determined the SPLIT I do not cast 
it to a particular data type. And the field on which I call isNumeric can be 
arbitrary in length.

2. Good point again, I do not expect a huge gain but Regex match will in most 
cases be slightly faster than parseDouble. Just to reiterate, the primary goal 
of implementing IsNumeric is not performance.

I think isNumeric is a nice to have UDF. But if it sounds like it would confuse 
users more than its worth, we could just stick to isInt/IsLong etc. 

                
> [Piggybank] Add UDFs to check if a String is an Integer And if a String is 
> Numeric
> ----------------------------------------------------------------------------------
>
>                 Key: PIG-2443
>                 URL: https://issues.apache.org/jira/browse/PIG-2443
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>            Reporter: Prashant Kommireddi
>            Assignee: Prashant Kommireddi
>         Attachments: isIntNumeric.patch, isIntNumeric.patch
>
>
> UDF that could be used to check if a String is numeric (or an Integer). 
> Several tools such as Splunk, AbInitio have this UDF built-in and companies 
> making an effort to move to Hadoop/Pig could use this.
> Use Case:
> In raw logs there are certain filters/conditions applied based on whether a 
> particular field/value is numeric or not. For eg, SPLIT A INTO CATEGORY1 IF 
> IsInt($0), CATEGORY2 IF !IsInt($0);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to