[ https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056468#comment-16056468 ]
Koji Noguchi commented on PIG-5201: ----------------------------------- bq. Have asked Koji Noguchi to check with couple of internal users who are both Pig and data pipeline experts and will be affected by this. >From the users, learned that there's a common pattern users use which can >easily break when FLATTEN(null-bag) start dropping records as I proposed... Basically their code looks like {code} ... C = FOREACH B GENERATE record_type, FLATTEN(type_a_bag), FLATTEN(type_b_bag); ... {code} When record_type is 'a', type_b_bag is null, and vice-versa. Instead of checking the record_type up-front, user simply flatten both and later examine the record_type. I hate inconsistency and I hate being wrong (and Rohini being right), but it looks like I would have to keep the current behavior of FLATTEN(null-bag) _not_ dropping. > Null handling on FLATTEN > ------------------------ > > Key: PIG-5201 > URL: https://issues.apache.org/jira/browse/PIG-5201 > Project: Pig > Issue Type: Bug > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Minor > Fix For: 0.18.0 > > Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, > pig-5201-v02.patch, pig-5201-v03.patch > > > Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect > results. > Test code/script to follow. -- This message was sent by Atlassian JIRA (v6.4.14#64029)