[
https://issues.apache.org/jira/browse/PIG-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174343#comment-13174343
]
Jonathan Coveney commented on PIG-2443:
---------------------------------------
Yeah, that's what I'd do. I wouldn't obsess over speed yet, I'd just implement
it and see how fast it is, and then if it's prohibitively slow go from there.
The more annoying issue is that since we're essentially converting it over,
there's going to be two casts when there only needs to be one.
You'll have IsInt() in the split, and then in the resultant field, you'll have
to cast the int one over to an int. It'd be nice if it could take advantage of
what is going on and post split, the true values will have the proper schema
:int, and the ones that aren't will still be :chararray.
> [Piggybank] Add UDFs to check if a String is an Integer And if a String is
> Numeric
> ----------------------------------------------------------------------------------
>
> Key: PIG-2443
> URL: https://issues.apache.org/jira/browse/PIG-2443
> Project: Pig
> Issue Type: New Feature
> Components: piggybank
> Reporter: Prashant Kommireddi
> Assignee: Prashant Kommireddi
> Attachments: isIntNumeric.patch, isIntNumeric.patch
>
>
> UDF that could be used to check if a String is numeric (or an Integer).
> Several tools such as Splunk, AbInitio have this UDF built-in and companies
> making an effort to move to Hadoop/Pig could use this.
> Use Case:
> In raw logs there are certain filters/conditions applied based on whether a
> particular field/value is numeric or not. For eg, SPLIT A INTO CATEGORY1 IF
> IsInt($0), CATEGORY2 IF !IsInt($0);
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira