[ https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated PIG-5452: ------------------------------ Attachment: pig-5452-v01.patch Instead of relying on innerfield schema, using the output schema which combines schema of data and user-defined schema. > Null handling of FLATTEN with user defined schema (as clause) > ------------------------------------------------------------- > > Key: PIG-5452 > URL: https://issues.apache.org/jira/browse/PIG-5452 > Project: Pig > Issue Type: Bug > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Major > Attachments: pig-5452-v01.patch > > > Follow up from PIG-5201, > {code:java} > A = load 'input' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 > as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2), a3; > dump C;{code} > This produces right number of nulls. > {code:java} > (a,,,a) > (b,,,b) > (c,,,c) > (d,,,d) > (f,,,f) {code} > > However, > {code:java} > A = load 'input.txt' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; > dump C;{code} > This produces wrong number of null and the output is shifted incorrectly. > {code:java} > (a,,a,) > (b,,b,) > (c,,c,) > (d,,d,) > (f,,f,) {code} > Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of > tuple() with empty inner fields. > -- This message was sent by Atlassian Jira (v8.20.10#820010)