[ https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613528#comment-13613528 ]
Prashant Kommireddi commented on PIG-3259: ------------------------------------------ {quote} The check you have here does not accept all valid double string representations {quote} - thanks for noticing that. {quote} One way to avoid performance degradation for 'correct' case would be to start by doing .valueOf() without checks, then use the number of non-numbers encountered to decide if want to be making the sanityCheckIntegerLongDecimal() calls {quote} - I am not clear on the advantage here. How do we determine the number of non-numbers without making calls to sanityCheck..()? > Optimize byte to Long/Integer conversions > ----------------------------------------- > > Key: PIG-3259 > URL: https://issues.apache.org/jira/browse/PIG-3259 > Project: Pig > Issue Type: Bug > Affects Versions: 0.11, 0.11.1 > Reporter: Prashant Kommireddi > Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: byteToLong.xlsx > > > These conversions can be performing better. If the input is not numeric > (1234abcd) the code calls Double.valueOf(String) regardless before finally > returning null. Any script that inadvertently (user's mistake or not) tries > to cast non-numeric column to int or long would result in many wasteful > calls. > We can avoid this and only handle the cases we find the input to be a decimal > number (1234.56) and return null otherwise even before trying > Double.valueOf(String). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira