[
https://issues.apache.org/jira/browse/PIG-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174502#comment-13174502
]
Prashant Kommireddi commented on PIG-2443:
------------------------------------------
1. IsNumeric does not check for Long/Double range at all. Its simply a check to
verify whether a String contains ONLY digits or not. The reason to implement
this is to give users the ability to make a check for numeric"ness", and not
necessarily to cast it back to a data type.
Example: At my previous company we stored item listings as a Numeric value.
These Item Listing IDs could go well beyond the range of Long/Double. If I try
to check for numeric"ness" based on a certain data type (long, double) it would
fail.
The reason I implemented this is currently I use it to only SPLIT based on
numeric"ness" in the log files. Once I have determined the SPLIT I do not cast
it to a particular data type. And the field on which I call isNumeric can be
arbitrary in length.
2. Good point again, I do not expect a huge gain but Regex match will in most
cases be slightly faster than parseDouble. Just to reiterate, the primary goal
of implementing IsNumeric is not performance.
I think isNumeric is a nice to have UDF. But if it sounds like it would confuse
users more than its worth, we could just stick to isInt/IsLong etc.
> [Piggybank] Add UDFs to check if a String is an Integer And if a String is
> Numeric
> ----------------------------------------------------------------------------------
>
> Key: PIG-2443
> URL: https://issues.apache.org/jira/browse/PIG-2443
> Project: Pig
> Issue Type: New Feature
> Components: piggybank
> Reporter: Prashant Kommireddi
> Assignee: Prashant Kommireddi
> Attachments: isIntNumeric.patch, isIntNumeric.patch
>
>
> UDF that could be used to check if a String is numeric (or an Integer).
> Several tools such as Splunk, AbInitio have this UDF built-in and companies
> making an effort to move to Hadoop/Pig could use this.
> Use Case:
> In raw logs there are certain filters/conditions applied based on whether a
> particular field/value is numeric or not. For eg, SPLIT A INTO CATEGORY1 IF
> IsInt($0), CATEGORY2 IF !IsInt($0);
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira