[
https://issues.apache.org/jira/browse/PIG-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620654#action_12620654
]
Alan Gates commented on PIG-354:
--------------------------------
Here's how output type determination works now: the type checker calls
outputSchema on the UDF. If it gets a schema it goes with that. If not, it
uses reflection to determine the return type of the UDF and then maps that to a
data type. The only case where we can't really tell what the UDF is returning
is if it doesn't declare a schema and its return type is Tuple or Bag. In that
case we have no way to guess what's inside. But all through the code we treat
tuples and bags with unknown contents as containing byte arrays. If we try to
do otherwise just for UDFs, it will be difficult (we'll end up tracking
lineage). so I don't want to change that.
The one other area we could change is POUserFunc. Currently, when it has a
type of bytearray, it checks if the object passed back is really a bytearray or
not. If not, it calls toString().toBytes() on it and constructs a
DataByteArray from that. We could do the same check when the type is charray.
Not sure how useful this would be.
> Change to default outputSchema for UDFs
> ---------------------------------------
>
> Key: PIG-354
> URL: https://issues.apache.org/jira/browse/PIG-354
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Olga Natkovich
> Priority: Critical
> Fix For: types_branch
>
>
> Currently, if UDF writer does not specify outputSchema the default is
> bytearray which is not what you would want most of the time. Making chararray
> a default would make things backward compatible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.