[ 
https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5452:
------------------------------
    Attachment: pig-5452-v01.patch

Instead of relying on innerfield schema, using the output schema which combines 
schema of data and user-defined schema.

> Null handling of FLATTEN with user defined schema (as clause)
> -------------------------------------------------------------
>
>                 Key: PIG-5452
>                 URL: https://issues.apache.org/jira/browse/PIG-5452
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Major
>         Attachments: pig-5452-v01.patch
>
>
> Follow up from PIG-5201, 
> {code:java}
> A = load 'input' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 
> as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
> dump C;{code}
> This produces right number of nulls.
> {code:java}
> (a,,,a)
> (b,,,b)
> (c,,,c)
> (d,,,d)
> (f,,,f) {code}
>  
> However, 
> {code:java}
> A = load 'input.txt' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
> dump C;{code}
> This produces wrong number of null and the output is shifted incorrectly. 
> {code:java}
> (a,,a,)
> (b,,b,)
> (c,,c,)
> (d,,d,)
> (f,,f,) {code}
> Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
> tuple() with empty inner fields.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to