[ https://issues.apache.org/jira/browse/PIG-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217406#comment-16217406 ]
Koji Noguchi commented on PIG-2537: ----------------------------------- [~daijy], since patch doesn't apply cleanly anymore, so mostly guessing here. Am I reading your patch correctly that it only works when tuple is read as bytearray from PigStorage and then type-casted to Tuple ? (and you're creating the correct number of nulls inside the cast?) I was looking at this issue in PIG-5201 (along with flatten on null bag and null map), and I believe my patch there is trying to take the approach that Thejas described 5 years back. bq. I think the problem needs to be solved in flatten > Output from flatten with a null tuple input generating data inconsistent with > the schema > ---------------------------------------------------------------------------------------- > > Key: PIG-2537 > URL: https://issues.apache.org/jira/browse/PIG-2537 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.8.0, 0.9.0 > Reporter: Xuefu Zhang > Assignee: Daniel Dai > Fix For: 0.18.0 > > Attachments: PIG-2537-1.patch, PIG-2537-2.patch, PIG-2537-3.patch > > > For the following pig script, > grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c ); > grunt> B = foreach A generate flatten( $0 ), b, c; > grunt> describe B; > B: {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray} > Alias B has a clear schema. > However, on the backend, for a row if $0 happens to be null, then output > tuple become something like > (null, b_value, c_value), which is obviously inconsistent with the schema. > The behaviour is confirmed by pig code inspection. > This inconsistency corrupts data because of position shifts. Expected output > row should be something like > (null, null, null, b_value, c_value). -- This message was sent by Atlassian JIRA (v6.4.14#64029)