[ 
https://issues.apache.org/jira/browse/PIG-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642241#action_12642241
 ] 

David Ciemiewicz commented on PIG-505:
--------------------------------------

Are we overthinking this problem at this time?

At this time, the only source of "undefined" values in user defined functions 
that I know of are those that return maps.  (I could be wrong).

Why don't we just make the following simplifying assumptions (or conventions) 
for right now?

1) UDFs that return maps must return the individual values as bytearray type.  
Period.
2) When casting using the lineage code, the code assumes that these are 
bytearray for conversion purposes.
3) Tell me how to code my UDFs to follow these guidelines and conventions.

The other option is to introduce some cast convention that allows me to define 
whether the map will adhere to a bytearray convention or a chararray convention 
to reduce the chance of redundant conversions.

For example -- (map<bytearray>) or (map<chararray>).  Or maybe this is handled 
intrinsically in the function definition.

> Lineage for UDFs that do not return bytearray
> ---------------------------------------------
>
>                 Key: PIG-505
>                 URL: https://issues.apache.org/jira/browse/PIG-505
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> In Pig-335, the lineage design states that UDFs that return bytearrays could 
> cause problems in tracing the lineage. For UDFs that do not return bytearray, 
> the lineage design should pickup the right load function to use as long as 
> there is no ambiguity.  In the current implementation, we could have issues 
> with scripts like:
> {code}
> a = load 'input' as (field1);
> b = foreach a generate myudf_to_double(field1);
> c =  foreach b generate $0 + 2.0;
> {code}
> When $0 has to be cast to a double, the lineage code will complain that it 
> hit a UDF and hence cannot determine the right load function to use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to