[
https://issues.apache.org/jira/browse/PIG-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173988#comment-13173988
]
Prashant Kommireddi commented on PIG-2443:
------------------------------------------
Proposal to implement 2 UDFs
1. IsInt
2. IsNumeric
IsInt is used to check whether the String input is an Integer. Note this
function checks for Integer range 2,147,483,648 to 2,147,483,647.
Use IsNumeric instead if you would like to check if a String is numeric. Also
IsNumeric performs better as its a regex match compared to IsInt which makes a
call to Integer.parseInt(String input)
IsInt checks whether making a call to Integer.parseInt results in a
NumberFormatException and returns the boolean accordingly.
{code}
public class IsInt extends EvalFunc<Boolean> {
@Override
public Boolean exec(Tuple input) throws IOException {
if (input == null || input.size() == 0) return false;
try {
String str = (String)input.get(0);
if (str == null || str.length() == 0) return false;
Integer.parseInt(str);
} catch (NumberFormatException nfe) {
return false;
} catch (ClassCastException e) {
warn("Unable to cast input "+input.get(0)+" of class "+
input.get(0).getClass()+" to String",
PigWarning.UDF_WARNING_1);
return false;
}
return true;
}
}
{code}
IsNumeric makes a Regex match against the Input to check whether all characters
are numeric digits.
{code}
public class IsNumeric extends EvalFunc<Boolean> {
@Override
public Boolean exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return false;
try {
String str = (String) input.get(0);
if (str == null || str.length() == 0)
return false;
if (str.startsWith("-"))
str = str.substring(1);
return str.matches("\\d*");
} catch (ClassCastException e) {
warn("Unable to cast input " + input.get(0) + " of
class "
+ input.get(0).getClass() + " to
String",
PigWarning.UDF_WARNING_1);
return false;
}
}
}
{code}
I have added Test cases for both UDFs as well.
> [Piggybank] Add UDFs to check if a String is an Integer And if a String is
> Numeric
> ----------------------------------------------------------------------------------
>
> Key: PIG-2443
> URL: https://issues.apache.org/jira/browse/PIG-2443
> Project: Pig
> Issue Type: New Feature
> Components: piggybank
> Reporter: Prashant Kommireddi
> Assignee: Prashant Kommireddi
> Attachments: isIntNumeric.patch, isIntNumeric.patch
>
>
> UDF that could be used to check if a String is numeric (or an Integer).
> Several tools such as Splunk, AbInitio have this UDF built-in and companies
> making an effort to move to Hadoop/Pig could use this.
> Use Case:
> In raw logs there are certain filters/conditions applied based on whether a
> particular field/value is numeric or not. For eg, SPLIT A INTO CATEGORY1 IF
> IsInt($0), CATEGORY2 IF !IsInt($0);
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira