[ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614137#comment-13614137
 ] 

Thejas M Nair commented on PIG-3259:
------------------------------------

bq.  How do we determine the number of non-numbers without making calls to 
sanityCheck..()?
By counting the number of times exception has so far been thrown by .valueOf(). 
Once a threshold has been crossed, we can introduce the sanity check for each 
new value. This will put a limit on worst ('incorrect') case performance 
without degrading the 'correct' case performance by much. 

I wonder if there are good libraries that we can use for the sanity checks, as 
the decimal check seems bit more complicated . 
                
> Optimize byte to Long/Integer conversions
> -----------------------------------------
>
>                 Key: PIG-3259
>                 URL: https://issues.apache.org/jira/browse/PIG-3259
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11, 0.11.1
>            Reporter: Prashant Kommireddi
>            Assignee: Prashant Kommireddi
>             Fix For: 0.12
>
>         Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric 
> (1234abcd) the code calls Double.valueOf(String) regardless before finally 
> returning null. Any script that inadvertently (user's mistake or not) tries 
> to cast non-numeric column to int or long would result in many wasteful 
> calls. 
> We can avoid this and only handle the cases we find the input to be a decimal 
> number (1234.56) and return null otherwise even before trying 
> Double.valueOf(String).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to