[ 
https://issues.apache.org/jira/browse/PIG-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642255#action_12642255
 ] 

Santhosh Srinivasan commented on PIG-505:
-----------------------------------------

Undefined values could occur in other complex types - tuples and bags. If the 
outputSchema method in the UDF is not implemented and the tuple or bag is 
flattened, the Pig has to deal with unknowns.

1. Pig will prefer that you return the correct type instead of bytearray unless 
your type is a bytearray.

2. Pig is making that assumption in the current implementation and we are 
continuing to adhere to that assumption

3. Unless you need bytearrays, return the appropriate type in your Map, i.e, 
Pig expects a Map<Object, Object>, for the values, please use Integer, Long, 
Float, Double, String, DataByteArray, Map, Tupe and Bag depending on your use 
case. Use DataByteArray only if you will use it as a bytearray in Pig

In the future, we might let users specify the value type and even the key type 
and value types of the map like strongly typed languages.

> Lineage for UDFs that do not return bytearray
> ---------------------------------------------
>
>                 Key: PIG-505
>                 URL: https://issues.apache.org/jira/browse/PIG-505
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> In Pig-335, the lineage design states that UDFs that return bytearrays could 
> cause problems in tracing the lineage. For UDFs that do not return bytearray, 
> the lineage design should pickup the right load function to use as long as 
> there is no ambiguity.  In the current implementation, we could have issues 
> with scripts like:
> {code}
> a = load 'input' as (field1);
> b = foreach a generate myudf_to_double(field1);
> c =  foreach b generate $0 + 2.0;
> {code}
> When $0 has to be cast to a double, the lineage code will complain that it 
> hit a UDF and hence cannot determine the right load function to use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to