[
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614137#comment-13614137
]
Thejas M Nair commented on PIG-3259:
------------------------------------
bq. How do we determine the number of non-numbers without making calls to
sanityCheck..()?
By counting the number of times exception has so far been thrown by .valueOf().
Once a threshold has been crossed, we can introduce the sanity check for each
new value. This will put a limit on worst ('incorrect') case performance
without degrading the 'correct' case performance by much.
I wonder if there are good libraries that we can use for the sanity checks, as
the decimal check seems bit more complicated .
> Optimize byte to Long/Integer conversions
> -----------------------------------------
>
> Key: PIG-3259
> URL: https://issues.apache.org/jira/browse/PIG-3259
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11, 0.11.1
> Reporter: Prashant Kommireddi
> Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric
> (1234abcd) the code calls Double.valueOf(String) regardless before finally
> returning null. Any script that inadvertently (user's mistake or not) tries
> to cast non-numeric column to int or long would result in many wasteful
> calls.
> We can avoid this and only handle the cases we find the input to be a decimal
> number (1234.56) and return null otherwise even before trying
> Double.valueOf(String).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira