[
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183639#comment-13183639
]
Scott Carey commented on PIG-2359:
----------------------------------
Performance comments:
bq. In PrimitiveTuple.get(), I wonder if you'd get faster access if you removed
the array bounds check. Java is going to do that for you anyway. You can catch
the IndexOutOfBoundsException and rethrow it with a nicer error message.
That is generally slower.
1. The JVM will detect your checks and not do its own bounds checks if yours
are sufficient. (
2. The JVM will profile the method, and compile the checks with the right CPU
branch hints and instruction layout based on the odds that the branch is taken.
3. If it is out of bounds, it is a hundred times faster to find out via an if
statement than a try/catch.
All of the above are much more noticeable if in a loop than a single access, so
it may not help here much.
bq. I did that when I was going to use ByteArrayBuffer, offered by httpcore.
The nice thing about it is that it's resizable, but then again it doesn't have
the r/wLong, r/wInt, etc methods, so I reverted to regular nio.ByteBuffer.
Note, nio.ByteBuffer is 'slow' (but very handy). Unfortunately, all calls to
it are virtual method calls and not inlined. This is because of its dual heap
/ direct nature. If serializnig data to a byte[], writing your own private
method to swizzle the int/long into the bytes can have significant performance
gains if it is a hot-spot in time spent since it will be inlined at critical
call sites while ByteBuffer's methods will not.
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
> Key: PIG-2359
> URL: https://issues.apache.org/jira/browse/PIG-2359
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Attachments: PIG-2359.1.patch, PIG-2359.2.patch, PIG-2359.3.patch,
> PIG-2359.4.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible
> to avoid this overhead, which would result in significant memory savings.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira