[
https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056468#comment-16056468
]
Koji Noguchi commented on PIG-5201:
-----------------------------------
bq. Have asked Koji Noguchi to check with couple of internal users who are
both Pig and data pipeline experts and will be affected by this.
>From the users, learned that there's a common pattern users use which can
>easily break when FLATTEN(null-bag) start dropping records as I proposed...
Basically their code looks like
{code}
...
C = FOREACH B GENERATE record_type, FLATTEN(type_a_bag), FLATTEN(type_b_bag);
...
{code}
When record_type is 'a', type_b_bag is null, and vice-versa.
Instead of checking the record_type up-front, user simply flatten both and
later examine the record_type.
I hate inconsistency and I hate being wrong (and Rohini being right), but it
looks like I would have to keep the current behavior of FLATTEN(null-bag) _not_
dropping.
> Null handling on FLATTEN
> ------------------------
>
> Key: PIG-5201
> URL: https://issues.apache.org/jira/browse/PIG-5201
> Project: Pig
> Issue Type: Bug
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch,
> pig-5201-v02.patch, pig-5201-v03.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect
> results.
> Test code/script to follow.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)