[ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170728#comment-14170728
 ] 

Remi Catherinot commented on PIG-3259:
--------------------------------------

Make SanityChecker thread safe. Current implementation is statefull (because of 
the numDots field) and not used within a synchronized block so it's not 
thread-safe. Make sanityCheckIntegerLongDecimal so it returns a byte, 0 would 
mean long/integer/byte/short, 1 would mean double, 2 would mean NaN. Doing so 
would make it thread safe and won't slow down implementation.

another little speed up is : when doing if (str.charAt(i)>='0' && 
str.charAt(i)<='9' && .... charAt(i) ... charAt(i) ....)
This can be replaced by declaring a char before the test, and use it in the 
test :
char c;
if ( (c=str.charAt(i))>='0' && c<='9' && ... c .... c )
because this code only calls charAt once

Also beware, it seems to me that you change the contract of the method. The 
current one tries its best to find a Long, if it fails then it fully relies on 
the JVM parsing (and so on the full specs) which cause the performance 
degradation in case of a bad format (mostly because of the exception). In the 
optimized one, if the check fails, null is returned. We can only do this if we 
are really fully confident on the fact the checker follow strictfully all the 
JVM number format specs (like for exemple octal long values, hexadecimal values 
which use 'p' rather than 'e' as their exponent operator, etc.).

Maybe a good way would be to take the code from the src.jar shipped with the 
JVM changing the "throw NumberFormatException" behavior with a "return null + 
rounding in case of double2long implicit cast" behavior, which is what you want 
to achieve. The JVM is slow in case of bad format because of the exception but 
is the fastest in case of good format. Just changing the behaviour.

> Optimize byte to Long/Integer conversions
> -----------------------------------------
>
>                 Key: PIG-3259
>                 URL: https://issues.apache.org/jira/browse/PIG-3259
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.11, 0.11.1
>            Reporter: Prashant Kommireddi
>            Assignee: Prashant Kommireddi
>             Fix For: 0.15.0
>
>         Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric 
> (1234abcd) the code calls Double.valueOf(String) regardless before finally 
> returning null. Any script that inadvertently (user's mistake or not) tries 
> to cast non-numeric column to int or long would result in many wasteful 
> calls. 
> We can avoid this and only handle the cases we find the input to be a decimal 
> number (1234.56) and return null otherwise even before trying 
> Double.valueOf(String).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to