It is a confusing result and I certainly vote to fixed. This is the correctness issue, so bring the right behavior is more important than some minor backward-compatibility.
Daniel On Fri, Dec 9, 2011 at 12:58 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > Hi guys, > I am running into a behavior of flatten that causes pretty significant bugs > in certain corner cases. > Not sure whether fixing it will cause any backwards-incompatibility issues, > so looking for your feedback. > > Here's the issue: > > Flattening a null bag results in the row being dropped. That's fine. > > Filtering a null tuple results in a single column (with the value null) > being produced. > > That leads to all the columns after the flattened value shifting left by > n-1 positions, where n is the number of expected fields in a tuple! > > Consider: > > grunt> sh cat tmp/x > foo bar > a (b,c) d > grunt> x = load 'tmp/x' as (a:chararray, b:(b:chararray, c:chararray), > d:chararray); > grunt> projected = foreach x generate d; > grunt> dump projected > *(bar) > *(d) > > grunt> flattened = foreach x generate a, flatten(b) as (b, c), d; > grunt> dump flattened > *(foo,,bar) * -- NOTE THREE FIELDS INSTEAD OF EXPECTED 4 > (a,b,c,d) > grunt> projected = foreach flattened generate d; > grunt> dump projected > *() *-- NOTE WRONG VALUE > (d) > grunt> projected = foreach flattened generate c; > *() *-- NOTE THAT, INCONSISTENTLY, C is NULL! AS IS B. > (c) > > I've reproduced this behavior in pig 8 and pig 9 (top of branch) >