[ 
https://issues.apache.org/jira/browse/PIG-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642223#action_12642223
 ] 

Santhosh Srinivasan commented on PIG-505:
-----------------------------------------

Responses with paragraph numbers:

Paragraph 2: 

The current lineage code barfs if the load function is null for converting 
bytearrays to Pig type. As a result, we have to pick a load function to use 
resulting in run time errors or erroneous results. Based on your comment, it 
seems appropriate to relax the rule that load functions cannot null for 
bytearray to Pig type conversions and then throw an appropriate error message 
at run time (assuming no bugs in the lineage code)

Paragraph 3:

The inputs to cast expression can serve as inputs to any operators that expects 
expressions. As a result, setting the return type of expression operator to 
unknown will have across the board impact. In order to mitigate this impact, we 
could introduce a new visitor that changes the type of all expressions that are 
not inputs to cast to bytearray, However, this introduces a problem. When do we 
use this visitor? Before the type checker or after the type checker? If we use 
the visitor before the type checker the we will lose unknown types for casts 
introduced by the type checker. If we use the visitor after the type checker, 
the type checker will barf if unknown types occur in the graph. As a result, we 
will have to either migrate some of the functionality of the type checker into 
the visitor. This approach is complicated and not worth the benefit.

Based on the discussions and given the cost implications of code complexity, 
maintenance and performance, the solution is probably the following:

1. Relax the rule of load function not being null in the lineage code.
2. If a null pointer exception occurs in the back end (POCast, specifically) 
then we assume that it was due to a bytearray created by a UDF and report an 
appropriate error message.

The only constraint to this solution is the assumption that the lineage code is 
not buggy. If the lineage code is buggy and we end up with a null load function 
for the right bytearray to Pig type conversion, it will require investigation.

> Lineage for UDFs that do not return bytearray
> ---------------------------------------------
>
>                 Key: PIG-505
>                 URL: https://issues.apache.org/jira/browse/PIG-505
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> In Pig-335, the lineage design states that UDFs that return bytearrays could 
> cause problems in tracing the lineage. For UDFs that do not return bytearray, 
> the lineage design should pickup the right load function to use as long as 
> there is no ambiguity.  In the current implementation, we could have issues 
> with scripts like:
> {code}
> a = load 'input' as (field1);
> b = foreach a generate myudf_to_double(field1);
> c =  foreach b generate $0 + 2.0;
> {code}
> When $0 has to be cast to a double, the lineage code will complain that it 
> hit a UDF and hence cannot determine the right load function to use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to