[ 
https://issues.apache.org/jira/browse/PIG-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642578#action_12642578
 ] 

Santhosh Srinivasan commented on PIG-505:
-----------------------------------------

Response to David's comments:

The map type in Pig was designed to hold any atomic key type (i.e., string, 
int, float, long, double) and any value type. As a result, the natural 
representation is a Map<Object, Object>.  The UDF has the right outputSchema 
implementation. UDFs that return maps should return Map<Object, Object>.

With the proposal in comment 3 
(https://issues.apache.org/jira/browse/PIG-505?focusedCommentId=12642223#action_12642223),
 the UDF will work as long as there are no DataByteArray values in the Map that 
require a cast.

Response to Pi's comments:

Treating unknowns as bytearrays will lead to run time errors which will not go 
away if we treat unknowns as unknowns. The trade-off is better error handling. 
Specifically, in your example, comparing 2 unknowns can be caught during type 
checking whereas making them bytearrays might result in a run time error iff 
the two types do not match.

Summary: Treating unknowns as bytearray will result in coarser error messages. 
On the other hand treating unknown as unknown will require significant changes 
without eliminating the possibility of run time errors.

> Lineage for UDFs that do not return bytearray
> ---------------------------------------------
>
>                 Key: PIG-505
>                 URL: https://issues.apache.org/jira/browse/PIG-505
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> In Pig-335, the lineage design states that UDFs that return bytearrays could 
> cause problems in tracing the lineage. For UDFs that do not return bytearray, 
> the lineage design should pickup the right load function to use as long as 
> there is no ambiguity.  In the current implementation, we could have issues 
> with scripts like:
> {code}
> a = load 'input' as (field1);
> b = foreach a generate myudf_to_double(field1);
> c =  foreach b generate $0 + 2.0;
> {code}
> When $0 has to be cast to a double, the lineage code will complain that it 
> hit a UDF and hence cannot determine the right load function to use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to