[ https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614137#comment-13614137 ]
Thejas M Nair commented on PIG-3259: ------------------------------------ bq. How do we determine the number of non-numbers without making calls to sanityCheck..()? By counting the number of times exception has so far been thrown by .valueOf(). Once a threshold has been crossed, we can introduce the sanity check for each new value. This will put a limit on worst ('incorrect') case performance without degrading the 'correct' case performance by much. I wonder if there are good libraries that we can use for the sanity checks, as the decimal check seems bit more complicated . > Optimize byte to Long/Integer conversions > ----------------------------------------- > > Key: PIG-3259 > URL: https://issues.apache.org/jira/browse/PIG-3259 > Project: Pig > Issue Type: Bug > Affects Versions: 0.11, 0.11.1 > Reporter: Prashant Kommireddi > Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: byteToLong.xlsx > > > These conversions can be performing better. If the input is not numeric > (1234abcd) the code calls Double.valueOf(String) regardless before finally > returning null. Any script that inadvertently (user's mistake or not) tries > to cast non-numeric column to int or long would result in many wasteful > calls. > We can avoid this and only handle the cases we find the input to be a decimal > number (1234.56) and return null otherwise even before trying > Double.valueOf(String). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira