[
https://issues.apache.org/jira/browse/PIG-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642255#action_12642255
]
Santhosh Srinivasan commented on PIG-505:
-----------------------------------------
Undefined values could occur in other complex types - tuples and bags. If the
outputSchema method in the UDF is not implemented and the tuple or bag is
flattened, the Pig has to deal with unknowns.
1. Pig will prefer that you return the correct type instead of bytearray unless
your type is a bytearray.
2. Pig is making that assumption in the current implementation and we are
continuing to adhere to that assumption
3. Unless you need bytearrays, return the appropriate type in your Map, i.e,
Pig expects a Map<Object, Object>, for the values, please use Integer, Long,
Float, Double, String, DataByteArray, Map, Tupe and Bag depending on your use
case. Use DataByteArray only if you will use it as a bytearray in Pig
In the future, we might let users specify the value type and even the key type
and value types of the map like strongly typed languages.
> Lineage for UDFs that do not return bytearray
> ---------------------------------------------
>
> Key: PIG-505
> URL: https://issues.apache.org/jira/browse/PIG-505
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Santhosh Srinivasan
> Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
>
> In Pig-335, the lineage design states that UDFs that return bytearrays could
> cause problems in tracing the lineage. For UDFs that do not return bytearray,
> the lineage design should pickup the right load function to use as long as
> there is no ambiguity. In the current implementation, we could have issues
> with scripts like:
> {code}
> a = load 'input' as (field1);
> b = foreach a generate myudf_to_double(field1);
> c = foreach b generate $0 + 2.0;
> {code}
> When $0 has to be cast to a double, the lineage code will complain that it
> hit a UDF and hence cannot determine the right load function to use.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.