[ 
https://issues.apache.org/jira/browse/PIG-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620654#action_12620654
 ] 

Alan Gates commented on PIG-354:
--------------------------------

Here's how output type determination works now:  the type checker calls 
outputSchema on the UDF.  If it gets a schema it goes with that.  If not, it 
uses reflection to determine the return type of the UDF and then maps that to a 
data type.  The only case where we can't really tell what the UDF is returning 
is if it doesn't declare a schema and its return type is Tuple or Bag.  In that 
case we have no way to guess what's inside.  But all through the code we treat 
tuples and bags with unknown contents as containing byte arrays.  If we try to 
do otherwise just for UDFs, it will be difficult (we'll end up tracking 
lineage).  so I don't want to change that.

The one other area we could change is POUserFunc.  Currently, when it has a 
type of bytearray, it checks if the object passed back is really a bytearray or 
not.  If not, it calls toString().toBytes() on it and constructs a 
DataByteArray from that. We could do the same check when the type is charray.  
Not sure how useful this would be.

> Change to default outputSchema for UDFs
> ---------------------------------------
>
>                 Key: PIG-354
>                 URL: https://issues.apache.org/jira/browse/PIG-354
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Priority: Critical
>             Fix For: types_branch
>
>
> Currently, if UDF writer does not specify outputSchema the default is 
> bytearray which is not what you would want most of the time. Making chararray 
> a default would make things backward compatible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to