[ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613354#comment-13613354
 ] 

Thejas M Nair commented on PIG-3259:
------------------------------------

Sounds like a good idea. 
The check you have here does not accept all valid double string representations 
(See 
http://docs.oracle.com/javase/6/docs/api/java/lang/Double.html#valueOf(java.lang.String)
 ) . (eg with exponent, or hexadecimal representation starting with 0x).

But if we can avoid the performance degradation for the 'correct' [1] case 
(which seems to be be in range of 2-8% in the micro benchmark that ran for at 
least few seconds), that would be better. One way to avoid performance 
degradation for 'correct' case would be to start by doing .valueOf() without 
checks, then use the number of non-numbers encountered to decide if want to be 
making the sanityCheckIntegerLongDecimal() calls.

[1]  - by correct I mean the case where the field declared an integer or a 
double has correct representation.
                
> Optimize byte to Long/Integer conversions
> -----------------------------------------
>
>                 Key: PIG-3259
>                 URL: https://issues.apache.org/jira/browse/PIG-3259
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11, 0.11.1
>            Reporter: Prashant Kommireddi
>            Assignee: Prashant Kommireddi
>             Fix For: 0.12
>
>         Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric 
> (1234abcd) the code calls Double.valueOf(String) regardless before finally 
> returning null. Any script that inadvertently (user's mistake or not) tries 
> to cast non-numeric column to int or long would result in many wasteful 
> calls. 
> We can avoid this and only handle the cases we find the input to be a decimal 
> number (1234.56) and return null otherwise even before trying 
> Double.valueOf(String).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to