Hi guys, I am running into a behavior of flatten that causes pretty significant bugs in certain corner cases. Not sure whether fixing it will cause any backwards-incompatibility issues, so looking for your feedback.
Here's the issue: Flattening a null bag results in the row being dropped. That's fine. Filtering a null tuple results in a single column (with the value null) being produced. That leads to all the columns after the flattened value shifting left by n-1 positions, where n is the number of expected fields in a tuple! Consider: grunt> sh cat tmp/x foo bar a (b,c) d grunt> x = load 'tmp/x' as (a:chararray, b:(b:chararray, c:chararray), d:chararray); grunt> projected = foreach x generate d; grunt> dump projected *(bar) *(d) grunt> flattened = foreach x generate a, flatten(b) as (b, c), d; grunt> dump flattened *(foo,,bar) * -- NOTE THREE FIELDS INSTEAD OF EXPECTED 4 (a,b,c,d) grunt> projected = foreach flattened generate d; grunt> dump projected *() *-- NOTE WRONG VALUE (d) grunt> projected = foreach flattened generate c; *() *-- NOTE THAT, INCONSISTENTLY, C is NULL! AS IS B. (c) I've reproduced this behavior in pig 8 and pig 9 (top of branch)