[ 
https://issues.apache.org/jira/browse/PIG-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239036#comment-13239036
 ] 

Thejas M Nair commented on PIG-2537:
------------------------------------

Thoughts on the solution - Pig should continue to allow and expect null values 
for objects such as tuple. I think the problem needs to be solved in flatten, 
as it is the one that promises a certain schema and fails to generate data of 
that schema if the value is null. But this means that flatten needs to be aware 
of the expected schema for the tuple/bags at run time, ie the schema would need 
to be serialized and sent to the backend. That change would also be non 
backward compatible. 
                
> Output from flatten with a null tuple input generating data inconsistent with 
> the schema
> ----------------------------------------------------------------------------------------
>
>                 Key: PIG-2537
>                 URL: https://issues.apache.org/jira/browse/PIG-2537
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Xuefu Zhang
>            Assignee: Daniel Dai
>             Fix For: 0.11
>
>         Attachments: PIG-2537-1.patch, PIG-2537-2.patch, PIG-2537-3.patch
>
>
> For the following pig script,
> grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c );
> grunt> B = foreach A generate flatten( $0 ), b, c;
> grunt> describe B;
> B: {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray}
> Alias B has a clear schema.
> However, on the backend, for a row if $0 happens to be null, then output 
> tuple become something like 
> (null, b_value, c_value), which is obviously inconsistent with the schema. 
> The behaviour is confirmed by pig code inspection. 
> This inconsistency corrupts data because of position shifts. Expected output 
> row should be something like
> (null, null, null, b_value, c_value).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to