[jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known

Alan Gates (Commented) (JIRA) Thu, 15 Dec 2011 16:52:02 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170640#comment-13170640
 ]


Alan Gates commented on PIG-2359:
---------------------------------

Comments:

In PrimitiveTuple.get(), I wonder if you'd get faster access if you removed the 
array bounds check.  Java is going to do that for you anyway.  You can catch 
the IndexOutOfBoundsException and rethrow it with a nicer error message.

The same comment applies to checking whether the buffer capacity will be 
exceeded by reading the requested field.

Also applies to set()

Does append ever make sense for these types of tuples?  Should it just throw 
NotSupportedException?

In the P*Tuple classes, when a user calls set(int pos, Object o), you are 
forcing o into the type of the tuple (e.g., for PIntTuple you are forcing it 
into an int).  This is a change of semantics from the general tuple contract 
where whatever you pass to set is taken to be the value for that field.  I 
would like to understand more about the use case when you would expect to see 
this used.  Is it that you want to force this to int because the data may or 
may not be all ints (like there may be some floats?).  I think it would be 
better to just take an int, and return a null and issue a warning if what you 
get isn't an int.  This still violates the semantic, but at least it doesn't 
silently produce a different result.  If the use case is only for the internal 
use of passing data between map and reducer or between MR jobs, then I 
definitely think we should forget all the checks and just assume the data is 
correct.

You added new methods to the TupleFactory class, which is marked as Stable.  
You'll need to provide default implementations of those to avoid breaking 
backward compatibility.

Why is this patch changing http libraries?  (See the changes to 
ivy/library.properties.)

                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch, PIG-2359.2.patch, PIG-2359.3.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are 
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible 
> to avoid this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known

Reply via email to