[
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170640#comment-13170640
]
Alan Gates commented on PIG-2359:
---------------------------------
Comments:
In PrimitiveTuple.get(), I wonder if you'd get faster access if you removed the
array bounds check. Java is going to do that for you anyway. You can catch
the IndexOutOfBoundsException and rethrow it with a nicer error message.
The same comment applies to checking whether the buffer capacity will be
exceeded by reading the requested field.
Also applies to set()
Does append ever make sense for these types of tuples? Should it just throw
NotSupportedException?
In the P*Tuple classes, when a user calls set(int pos, Object o), you are
forcing o into the type of the tuple (e.g., for PIntTuple you are forcing it
into an int). This is a change of semantics from the general tuple contract
where whatever you pass to set is taken to be the value for that field. I
would like to understand more about the use case when you would expect to see
this used. Is it that you want to force this to int because the data may or
may not be all ints (like there may be some floats?). I think it would be
better to just take an int, and return a null and issue a warning if what you
get isn't an int. This still violates the semantic, but at least it doesn't
silently produce a different result. If the use case is only for the internal
use of passing data between map and reducer or between MR jobs, then I
definitely think we should forget all the checks and just assume the data is
correct.
You added new methods to the TupleFactory class, which is marked as Stable.
You'll need to provide default implementations of those to avoid breaking
backward compatibility.
Why is this patch changing http libraries? (See the changes to
ivy/library.properties.)
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
> Key: PIG-2359
> URL: https://issues.apache.org/jira/browse/PIG-2359
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Attachments: PIG-2359.1.patch, PIG-2359.2.patch, PIG-2359.3.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible
> to avoid this overhead, which would result in significant memory savings.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira