[jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known

Scott Carey (Commented) (JIRA) Tue, 10 Jan 2012 15:19:06 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183690#comment-13183690
 ]


Scott Carey commented on PIG-2359:
----------------------------------

bq. The JVM will detect your checks and not do its own bounds checks if yours 
are sufficient

More info:
The JVM tries to eliminate array bounds checks.  It can do this in a few ways.  
There is the loop predication work that is in JRE 6_u22 or so and later that 
will move bound checks from inside a loop to the outside of the loop when it 
can.  Older JVMs can do similar, but in fewer situations.  This is both the 
intrinsic Java check and any you write yourself.  In fact, it tries to hoist 
all sorts of code outside the loop if it can, not just array bounds checks.
If it can prove that the value passed in is within range it may eliminate the 
bounds check.  This can be due to the index variable having a known range (0 to 
arr.length, for example) or a few other conditions.

For a public virtual method like Tuple.get() it will almost never be able to 
inline the call at the call site, and so it may not ever be able to prove that 
it can remove the bounds checks.  In this sort of situation, there are two fast 
ways:  don't check yourself and let the exception bubble up, or check yourself 
and handle the out of bound condition yourself.  In general, catching an index 
out of bounds exception is slower than checking yourself since the JVM can 
prove that its own checks are useless with yours guarding them and exceptoin 
handling is much slower than a code branch.
In the condition that the method may be inlined aggressively (small private or 
effectively final methods especially) leaving manual checks out can be very 
fast since the JVM may be able to prove that none are necessary at all at a 
given call site.

Variants can be performance tested and refined over time.  It doesn't have to 
be perfect now.

                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch, PIG-2359.2.patch, PIG-2359.3.patch, 
> PIG-2359.4.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are 
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible 
> to avoid this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known

Reply via email to