[ 
https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056468#comment-16056468
 ] 

Koji Noguchi commented on PIG-5201:
-----------------------------------

bq.  Have asked Koji Noguchi to check with couple of internal users who are 
both Pig and data pipeline experts and will be affected by this.

>From the users, learned that there's a common pattern users use which can 
>easily break when FLATTEN(null-bag) start dropping records as I proposed... 

Basically their code looks like
{code}
...
C = FOREACH B GENERATE record_type, FLATTEN(type_a_bag), FLATTEN(type_b_bag); 
...
{code}
When record_type is 'a', type_b_bag is null, and vice-versa. 
Instead of checking the record_type up-front, user simply flatten both and 
later examine the record_type.

I hate inconsistency and I hate being wrong (and Rohini being right), but it 
looks like I would have to keep the current behavior of FLATTEN(null-bag) _not_ 
dropping.  

> Null handling on FLATTEN
> ------------------------
>
>                 Key: PIG-5201
>                 URL: https://issues.apache.org/jira/browse/PIG-5201
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, 
> pig-5201-v02.patch, pig-5201-v03.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect 
> results.
> Test code/script to follow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to