[ 
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171315#comment-13171315
 ] 

Alan Gates commented on PIG-2359:
---------------------------------

bq. The use case isn't just internal, I started this in the first case because 
I needed to construct large tuple bags in a UDF. My reasoning for taking int 
value was that this is what we do when people "cast" a float to an int in pig. 
If you declare the schema to be an int, and put in a float... seems to me like 
having an int come out is ok. Could also die abruptly. I think null would be 
most surprising of the available choices.

When will these specialized tuple types get used?  Pig will use them internally 
when we expect a bag (or whatever) to contain that type.  Users can use them in 
UDFs they construct.  Are there are other cases where we envision them being 
used?  I agree my "push it to null" is just as arbitrary as your "push to the 
type I expected".  I shy away from failing jobs on these kinds of errors 
because you hate for one row in a billion to fail an entire job.  I guess I'm 
ok with your approach, though I think it should issue a warning (since it seems 
clear the user expected to find only one type in the data), and I think the 
Javadoc comments on the set() functions should clearly declare that this 
instance of the function bends the semantics of the interface.

Performance wise, this looks very exciting.
                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch, PIG-2359.2.patch, PIG-2359.3.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are 
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible 
> to avoid this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to