Hi guys,
I am running into a behavior of flatten that causes pretty significant bugs
in certain corner cases.
Not sure whether fixing it will cause any backwards-incompatibility issues,
so looking for your feedback.

Here's the issue:

Flattening a null bag results in the row being dropped. That's fine.

Filtering a null tuple results in a single column (with the value null)
being produced.

That leads to all the columns after the flattened value shifting left by
n-1 positions, where n is the number of expected fields in a tuple!

Consider:

grunt> sh cat tmp/x
foo bar
a (b,c) d
grunt> x = load 'tmp/x' as (a:chararray, b:(b:chararray, c:chararray),
d:chararray);
grunt> projected = foreach x generate d;
grunt> dump projected
*(bar)
*(d)

grunt> flattened = foreach x generate a, flatten(b) as (b, c), d;
grunt> dump flattened
*(foo,,bar) * -- NOTE THREE FIELDS INSTEAD OF EXPECTED 4
(a,b,c,d)
grunt> projected = foreach flattened generate d;
grunt> dump projected
*()  *-- NOTE WRONG VALUE
(d)
grunt> projected = foreach flattened generate c;
*() *-- NOTE THAT, INCONSISTENTLY, C is NULL! AS IS B.
(c)

I've reproduced this behavior in pig 8 and pig 9 (top of branch)

Reply via email to