[jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known

Scott Carey (Commented) (JIRA) Tue, 10 Jan 2012 14:25:08 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183639#comment-13183639
 ]


Scott Carey commented on PIG-2359:
----------------------------------

Performance comments:

bq. In PrimitiveTuple.get(), I wonder if you'd get faster access if you removed 
the array bounds check. Java is going to do that for you anyway. You can catch 
the IndexOutOfBoundsException and rethrow it with a nicer error message.

That is generally slower.  
1. The JVM will detect your checks and not do its own bounds checks if yours 
are sufficient. (
2. The JVM will profile the method, and compile the checks with the right CPU 
branch hints and instruction layout based on the odds that the branch is taken.
3. If it is out of bounds, it is a hundred times faster to find out via an if 
statement than a try/catch.

All of the above are much more noticeable if in a loop than a single access, so 
it may not help here much.

bq. I did that when I was going to use ByteArrayBuffer, offered by httpcore. 
The nice thing about it is that it's resizable, but then again it doesn't have 
the r/wLong, r/wInt, etc methods, so I reverted to regular nio.ByteBuffer.

Note, nio.ByteBuffer is 'slow' (but very handy).  Unfortunately, all calls to 
it are virtual method calls and not inlined.  This is because of its dual heap 
/ direct nature.  If serializnig data to a byte[], writing your own private 
method to swizzle the int/long into the bytes can have significant performance 
gains if it is a hot-spot in time spent since it will be inlined at critical 
call sites while ByteBuffer's methods will not.
                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch, PIG-2359.2.patch, PIG-2359.3.patch, 
> PIG-2359.4.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are 
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible 
> to avoid this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known

Reply via email to